Global ETD Search

771	Resource-Efficient Machine Learning Systems: From Natural Behavior to Natural Language Biderman, Dan January 2024 (has links) Contemporary machine learning models exhibit unprecedented performance in the text, vision, and time-series domains, but at the cost of significant computational and human resources. Applying these technologies for science requires balancing accuracy and resource allocation, which I investigate here via three unique case studies. In Chapter 1, I present a deep learning system for animal pose estimation from video. Existing approaches rely on frame-by-frame supervised deep learning, which requires extensive manual labeling, fails to generalize to data far outside of its training set, and occasionally produces scientifically-critical errors that are hard to detect. The solution proposed here includes semi-supervised learning on unlabeled videos, video-centric network architectures, and a post-processing step that combines network ensembling and state-space modeling. These methods improve performance both with scarce and abundant labels, and are implemented in an easy-to-use software package and cloud application. In Chapter 2, I turn to the Gaussian process, a canonical nonparametric model, known for its poor scaling with dataset size. Existing methods accelerate Gaussian processes at the cost of modeling biases. I analyze two common techniques -- early truncated conjugate gradients and random Fourier features -- showing that they find hyperparameters that underfit and overfit the data, respectively. I then propose to eliminate these biases in exchange of increased variance, via randomized truncation estimators. In In Chapter 3, I investigate continual learning, or "finetuning", in large language models (LLMs) with billions of weights. Training these models requires more memory than typically available in academic clusters. Low-Rank Adaptation (LoRA) is a widely-used technique that saves memory by training only low rank perturbations to selected weight matrices in a so-called "base model'". I compare the performance of LoRA and full finetuning on two target domains, programming and mathematics, across different data regimes. I find that in most common settings, LoRA underperforms full finetuning, but it nevertheless exhibits a desirable form of regularization: it better maintains the base model's performance on tasks outside the target domain. I then propose best practices for finetuning with LoRA. In summary, applying state-of-the-art models to large scientific datasets necessitates taking computational shortcuts. This thesis highlights the implications of these shortcuts and emphasizes the need for careful empirical and theoretical investigation to find favorable trade-offs between accuracy and resource allocation. Neurosciences Artificial intelligence Native language Animal behavior Machine learning Deep learning (Machine learning) Gaussian processes
772	Nouvelles paramétrisations de réseaux Bayésiens et leur estimation implicite - Famille exponentielle naturelle et mélange infini de Gaussiennes Jarraya Siala, Aida 26 October 2013 (has links) (PDF) L'apprentissage d'un réseau Bayésien consiste à estimer le graphe (la structure) et les paramètres des distributions de probabilités conditionnelles associées à ce graphe. Les algorithmes d'apprentissage de réseaux Bayésiens utilisent en pratique une approche Bayésienne classique d'estimation a posteriori dont les paramètres sont souvent déterminés par un expert ou définis de manière uniforme Le coeur de cette thèse concerne l'application aux réseaux Bayésiens de plusieurs avancées dans le domaine des Statistiques comme l'estimation implicite, les familles exponentielles naturelles ou les mélanges infinis de lois Gaussiennes dans le but de (1) proposer de nouvelles formes paramétriques, (2) estimer des paramètres de tels modèles et (3) apprendre leur structure. [STAT:ML] Statistics/Machine Learning [STAT:ML] Statistiques/Machine Learning Réseau bayésien Estimation implicite Famille exponentielle Mélange infini de gausiennes
773	Sur quelques problèmes non-supervisés impliquant des séries temporelles hautement dépendantes Khaleghi, Azadeh 18 November 2013 (has links) (PDF) Cette thèse est consacrée à l'analyse théorique de problèmes non supervisés impliquant des séries temporelles hautement dépendantes. Plus particulièrement, nous abordons les deux problèmes fondamentaux que sont le problème d'estimation des points de rupture et le partitionnement de séries temporelles. Ces problèmes sont abordés dans un cadre extrêmement général où les données sont générées par des processus stochastiques ergodiques stationnaires. Il s'agit de l'une des hypothèses les plus faibles en statistiques, comprenant non seulement, les hypothèses de modèles et les hypothèses paramétriques habituelles dans la littérature scientifique, mais aussi des hypothèses classiques d'indépendance, de contraintes sur l'espace mémoire ou encore des hypothèses de mélange. En particulier, aucune restriction n'est faite sur la forme ou la nature des dépendances, de telles sortes que les échantillons peuvent être arbitrairement dépendants. Pour chaque problème abordé, nous proposons de nouvelles méthodes non paramétriques et nous prouvons de plus qu'elles sont, dans ce cadre, asymptotique- ment consistantes. Pour l'estimation de points de rupture, la consistance asymptotique se rapporte à la capacité de l'algorithme à produire des estimations des points de rupture qui sont asymptotiquement arbitrairement proches des vrais points de rupture. D'autre part, un algorithme de partitionnement est asymptotiquement consistant si le partitionnement qu'il produit, restreint à chaque lot de séquences, coïncides, à partir d'un certain temps et de manière consistante, avec le partitionnement cible. Nous montrons que les algorithmes proposés sont implémentables efficacement, et nous accompagnons nos résultats théoriques par des évaluations expérimentales. L'analyse statistique dans le cadre stationnaire ergodique est extrêmement difficile. De manière générale, il est prouvé que les vitesses de convergence sont impossibles à obtenir. Dès lors, pour deux échantillons générés indépendamment par des processus ergodiques stationnaires, il est prouvé qu'il est impossible de distinguer le cas où les échantillons sont générés par le même processus de celui où ils sont générés par des processus différents. Ceci implique que des problèmes tels le partitionnement de séries temporelles sans la connaissance du nombre de partitions ou du nombre de points de rupture ne peut admettre de solutions consistantes. En conséquence, une tâche difficile est de découvrir les formulations du problème qui en permettent une résolution dans ce cadre général. La principale contribution de cette thèse est de démontrer (par construction) que malgré ces résultats d'impossibilités théoriques, des formulations naturelles des problèmes considérés existent et admettent des solutions consistantes dans ce cadre général. Ceci inclut la démonstration du fait que le nombre de points de rupture corrects peut être trouvé, sans recourir à des hypothèses plus fortes sur les processus stochastiques. Il en résulte que, dans cette formulation, le problème des points de rupture peut être réduit à du partitionnement de séries temporelles. Les résultats présentés dans ce travail formulent les fondations théoriques pour l'analyse des données séquentielles dans un espace d'applications bien plus large. [STAT:ML] Statistics/Machine Learning [STAT:ML] Statistiques/Machine Learning séries temporelles apprentissage non-supervisé analyse de points de ruptures clustering
774	Analyse et fouille de données de trajectoires d'objets mobiles El Mahrsi, Mohamed Khalil 30 September 2013 (has links) (PDF) Dans cette thèse, nous explorons deux problèmes de recherche liés à la gestion et à la fouille de données de trajectoires d'objets mobiles. Dans un premier temps, nous étudions l'échantillonnage de flux de trajectoires. Les appareils de géo-localisation modernes sont capables d'enregistrer et de transmettre leurs coordonnées géographiques à un taux très élevé. Garder l'intégralité des trajectoires capturées grâce à ces terminaux peut s'avérer coûteux tant en espace de stockage qu'en temps de calcul. L'élaboration de techniques d'échantillonnage adaptées devient alors primordiale afin de réduire la volumétrie des données en supprimant certaines positions (jugées inutiles ou redondantes) tout en veillant à préserver le maximum des caractéristiques spatiotemporelles des trajectoires originales. Dans le contexte de flux de données, ces techniques doivent en plus être exécutées " à la volée " et s'adapter au caractère à la fois continu et éphémère des données. Afin de répondre à ces besoins, nous proposons l'algorithme STSS (Spatiotemporal Stream Sampling). STSS bénéficie d'une faible complexité temporelle et garantit une borne supérieure pour les erreurs commises lors de l'échantillonnage. Nous présentons également une étude expérimentale à travers laquelle nous montrons les performances de notre proposition tout en la comparant à d'autres approches proposées dans la littérature. La deuxième problématique étudiée dans le cadre de ce travail est celle de la classification non supervisée (ou clustering) de trajectoires contraintes par un réseau routier. La majorité des travaux traitant du clustering de trajectoires se sont intéressés au cas où ces dernières évoluent librement dans un espace Euclidien. Ces travaux n'ont donc pas pris en considération l'éventuelle présence d'un réseau sous-jacent au mouvement, dont les contraintes jouent un rôle primordial dans l'évaluation de la similarité entre trajectoires. Nous proposons trois approches pour traiter ce cas. La première approche se focalise sur la découverte de groupes de trajectoires ayant parcouru les mêmes parties du réseau routier. La deuxième approche vise à grouper des segments routiers visités très fréquemment par les mêmes trajectoires. Quant à la troisième approche, elle combine les deux aspects afin d'effectuer un co-clustering simultané des trajectoires et des segments routiers. Nous illustrons nos approches à travers divers cas d'étude afin de démontrer comment elles peuvent servir à caractériser le trafic routier et les dynamiques de mouvement dans le réseau routier. Nous réalisons des études expérimentales afin d'évaluer les performances de nos propositions. [STAT:ML] Statistics/Machine Learning [STAT:ML] Statistiques/Machine Learning objets mobiles trajectoires réseau routier échantillonnage spatiotemporel flux de données similarité classification non supervisée
775	Predicting Consumer Purchase behavior using Automatic Machine Learning : A case study in online purchase flows / Prediktering av Konsumentbeteenden med Automatisk Maskininlärning : En fallstudie i onlinebaserade köpflöden Sandström, Olle January 2022 (has links) Online payment purchase flows are designed to be as effective and smooth as possible in regards to the user experience. The user is in the center of this process, who, to a certain degree decides whether the purchase eventually will be placed. What is left up to the payment provider is the process of enabling an effective purchase flow where information needs to be collected for various purposes. To design these purchase flows as efficiently as possible, this research investigates if and how consumer purchase behavior can be predicted. Which algorithms perform the best at modeling the outcome and what kind of underlying features can be used to model the outcome? The features are graded in regard to their feature importance to see how and how much they affect the best-performing model. To investigate consumer behavior, the task was set up as a supervised binary classification problem to model the outcome of user purchase sessions. Either the sessions result in a purchase or they do not. Several automatic machine learning (also referred to as automated machine learning) frameworks were considered before the choice of using H2O AutoML because of its historical performance on other supervised binary classification problems. The dataset contained information from user sessions relating to the consumer, the transaction, and the time when the purchase was initiated. These variables were either in a numerical or categorical format and were then evaluated using the SHAP importance metric as well as an aggregated SHAP summary plot, which describes how features are affecting the model. The results show that the Distributed Random Forest Algorithm performed the best, generating a 26 percentage points improvement in accuracy, predicting whether a session will be converted into a purchase from an undersampled baseline of 50%. Furthermore two of the most important features according to the model were categorical features related to the intersection of consumer and transaction information. Another time-based categorical variable also proved to be important in the model prediction. The research also shows that automatic machine learning has come a long way in the pre-processing of variables, enabling the developer of the models to more efficiently deploy these kinds of machine learning problems. The results echo some earlier findings confirming the possibility of predicting consumer purchase behavior and in particular, the outcome of a purchase flow consumer session. This implies that payment providers hypothetically could use these kinds of insights and predictions in the development of their flows, to individually cater to specific groups of consumers, enabling a more efficient and personalized payment flow. / Köpflöden för onlinebetalningar är utformade för att vara så effektiva och smidiga som möjligt med avseende på användarupplevelsen. I processen står användaren i centrum, som delvis avgör om köpet i slutändan konverteras eller ej. Det som är upp till betalningsleverantören är möjliggörandet av ett effektivt köpflöde där information behöver samlas in för olika ändamål. För att utforma dessa köpflöden så effektivt som möjligt undersöker detta arbete om och hur konsumenters köpbeteende kan förutsägas. Vilka algoritmer fungerar bäst på att modellera resultatet och vilken typ av underliggande attribut kan användas för att modellera resultatet? Dessa attribut graderas med avseende på deras relevans (feature importance) för att se hur och hur mycket de faktiskt påverkar den bäst presterande modellen. För att undersöka konsumentbeteendet sattes uppgiften upp som ett övervakat binärt klassificeringsproblem för att modellera resultatet av användarnas sessioner. Antingen resulterar sessionerna i ett köp eller så gör de det inte. Flera ramverk för automatisk maskininlärning övervägdes innan valet att använda H2O AutoML på grund av dess historiska prestanda på andra övervakade binära klassificeringsproblem. Dataunderlaget innehöll information från användarsessioner som rör konsumenten, transaktionen och tidpunkten då köpet påbörjades. Dessa variabler var antingen i ett numeriskt eller kategoriskt format och utvärderades sedan med hjälp av SHAP-viktighetsmåttet (SHAP Feature Importance) såväl som ett aggregerat SHAP-diagram, som beskriver hur de olika attributen påverkar modellen. Resultaten visar att Distributed Random Forest algoritmen presterade bäst, genererade en förbättring på 26 procentenheter i noggrannhet (accuracy), i prediktionen av ifall en session omvandlas till ett köp eller ej, baserat på ett undersamplat dataset med en baslinje på 50%. Dessutom var två av de viktigaste attributen enligt modellen kategoriska attribut relaterade till skärningspunkten mellan konsument- och transaktionsinformation. En annan tidsbaserad kategorisk variabel visade sig också vara viktig i prediktionen. Arbetet visar också att automatisk maskininlärning har kommit långt i förbearbetningen av variabler, vilket gör det möjligt för utvecklaren av modellerna att mer effektivt distribuera den här typen av maskininlärningsproblem. Resultaten återspeglar tidigare insikter som bekräftar möjligheten att förutsäga konsumenternas köpbeteende och i synnerhet resultatet av en konsumentsession i ett köpflöde. Detta innebär att betalningsleverantörer hypotetiskt skulle kunna använda denna typ av insikter och förutsägelser i utvecklingen av sina flöden, för att individuellt tillgodose specifika grupper av konsumenter, vilket möjliggör ett ännu mer effektivt och skräddarsytt betalningsflöde. Automatic Machine Learning Automated Machine Learning Binary Classification Payment Provider Purchase Flow Supervised Machine Learning Automatisk Maskininlärning Betalflöde Betalningsleverantör Binär klassificering Vägledd Maskininlärning Computer and Information Sciences Data- och informationsvetenskap
776	Trojan Attacks and Defenses on Deep Neural Networks Yingqi Liu (13943811) 13 October 2022 (has links) <p>With the fast spread of machine learning techniques, sharing and adopting public deep neural networks become very popular. As deep neural networks are not intuitive for human to understand, malicious behaviors can be injected into deep neural networks undetected. We call it trojan attack or backdoor attack on neural networks. Trojaned models operate normally when regular inputs are provided, and misclassify to a specific output label when the input is stamped with some special pattern called trojan trigger. Deploying trojaned models can cause various severe consequences including endangering human lives (in applications like autonomous driving). Trojan attacks on deep neural networks introduce two challenges. From the attacker's perspective, since the training data or training process is usually not accessible to the attacker, the attacker needs to find a way to carry out the trojan attack without access to training data. From the user's perspective, the user needs to quickly scan the online public deep neural networks and detect trojaned models.</p> <p>We try to address these challenges in this dissertation. For trojan attack without access to training data, We propose to invert the neural network to generate a general trojan trigger, and then retrain the model with reverse-engineered training data to inject malicious behaviors to the model. The malicious behaviors are only activated by inputs stamped with the trojan trigger. To scan and detect trojaned models, we develop a novel technique that analyzes inner neuron behaviors by determining how output activation change when we introduce different levels of stimulation to a neuron. A trojan trigger is then reverse-engineered through an optimization procedure using the stimulation analysis results, to confirm that a neuron is truly compromised. Furthermore, for complex trojan attacks, we propose a novel complex trigger detection method. It leverages a novel symmetric feature differencing method to distinguish features of injected complex triggers from natural features. For trojan attacks on NLP models, we propose a novel backdoor scanning technique. It transforms a subject model to an equivalent but differentiable form. It then inverts a distribution of words denoting their likelihood in the trigger and applies a novel word discriminativity analysis to determine if the subject model is particularly discriminative for the presence of likely trigger words.</p> Adversarial machine learning Adversarial Machine Learning Backdoor Attack Trojan Attack AI safety Robust Machine Learning Artificial Intelligence
777	Neural Sequence Modeling for Domain-Specific Language Processing: A Systematic Approach Zhu, Ming 14 August 2023 (has links) In recent years, deep learning based sequence modeling (neural sequence modeling) techniques have made substantial progress in many tasks, including information retrieval, question answering, information extraction, machine translation, etc. Benefiting from the highly scalable attention-based Transformer architecture and enormous open access online data, large-scale pre-trained language models have shown great modeling and generalization capacity for sequential data. However, not all domains benefit equally from the rapid development of neural sequence modeling. Domains like healthcare and software engineering have vast amounts of sequential data containing rich knowledge, yet remain under-explored due to a number of challenges: 1) the distribution of the sequences in specific domains is different from the general domain; 2) the effective comprehension of domain-specific data usually relies on domain knowledge; and 3) the labelled data is usually scarce and expensive to get in domain-specific settings. In this thesis, we focus on the research problem of applying neural sequence modeling methods to address both common and domain-specific challenges from the healthcare and software engineering domains. We systematically investigate neural-based machine learning approaches to address the above challenges in three research directions: 1) learning with long sequences, 2) learning from domain knowledge and 3) learning under limited supervision. Our work can also potentially benefit more domains with large amounts of sequential data. / Doctor of Philosophy / In the last few years, computer programs that learn and understand human languages (an area called machine learning for natural language processing) have significantly improved. These advances are visible in various areas such as retrieving information, answering questions, extracting key details from texts, and translating between languages. A key to these successes has been the use of a type of neural network structure known as a "Transformer", which can process and learn from lots of information found online. However, these successes are not uniform across all areas. Two fields, healthcare and software engineering, still present unique challenges despite having a wealth of information. Some of these challenges include the different types of information in these fields, the need for specific expertise to understand this information, and the shortage of labeled data, which is crucial for training machine learning models. In this thesis, we focus on the use of machine learning for natural language processing methods to solve these challenges in the healthcare and software engineering fields. Our research investigates learning with long documents, learning from domain-specific expertise, and learning when there's a shortage of labeled data. The insights and techniques from our work could potentially be applied to other fields that also have a lot of sequential data. Machine Learning for Code Machine Learning for Healthcare Information Retrieval Question Answering Entity Linking Program Translation Code Refinement Sequence-to-Sequence Models
778	AN ADAPTIVE RULE-BASED SYSTEM Stackhouse, Christian Paul, 1960- January 1987 (has links) Adaptive systems are systems whose characteristics evolve over time to improve their performance at a task. A fairly new area of study is that of adaptive rule-based systems. The system studied for this thesis uses meta-knowledge about rules, rulesets, rule performance, and system performance in order to improve its overall performance in a problem domain. An interesting and potentially important phenomenon which emerged is that the performance the system learns while solving a problem appears to be limited by an inherent break-even level of complexity. That is, the cost to the system of acquiring complexity does not exceed its benefit for that problem. If the problem is made more difficult, however, more complexity is required, the benefit of complexity becomes greater than its cost, and the system complexity begins increasing, ultimately to the new break-even point. There is no apparent ultimate limit to the complexity attainable. Adaptive control systems. Expert systems (Computer science) Machine learning.
779	Graph-based protein-protein interaction prediction in Saccharomyces cerevisiae Paradesi, Martin Samuel Rao January 1900 (has links) Master of Science / Department of Computing and Information Sciences / Doina Caragea / William H. Hsu / The term 'protein-protein interaction (PPI)' refers to the study of associations between proteins as manifested through biochemical processes such as formation of structures, signal transduction, transport, and phosphorylation. PPI play an important role in the study of biological processes. Many PPI have been discovered over the years and several databases have been created to store the information about these interactions. von Mering (2002) states that about 80,000 interactions between yeast proteins are currently available from various high-throughput interaction detection methods. Determining PPI using high-throughput methods is not only expensive and time-consuming, but also generates a high number of false positives and false negatives. Therefore, there is a need for computational approaches that can help in the process of identifying real protein interactions. Several methods have been designed to address the task of predicting protein-protein interactions using machine learning. Most of them use features extracted from protein sequences (e.g., amino acids composition) or associated with protein sequences directly (e.g., GO annotation). Others use relational and structural features extracted from the PPI network, along with the features related to the protein sequence. When using the PPI network to design features, several node and topological features can be extracted directly from the associated graph. In this thesis, important graph features of a protein interaction network that help in predicting protein interactions are identified. Two previously published datasets are used in this study. A third dataset has been created by combining three PPI databases. Several classifiers are applied on the graph attributes extracted from protein interaction networks of these three datasets. A detailed study has been performed in this present work to determine if graph attributes extracted from a protein interaction network are more predictive than biological features of protein interactions. The results indicate that the performance criteria (such as Sensitivity, Specificity and AUC score) improve when graph features are combined with biological features. Protein-protein interactions Machine Learning Bioinformatics Computer Science (0984)
780	A pilot study to integrate HIV drug resistance gold standard interpretation algorithms using neural networks Singh, Y., Mars, M. January 2013 (has links) Published Article / There are several HIV drug resistant interpretation algorithms which produce different resistance measures even if applied to the same resistance profile. This discrepancy leads to confusion in the mind of the physician when choosing the best ARV therapy. Machine learning Artificial intelligence Neural networks HIV drug resistance

Search results