Global ETD Search

21	A Multitask Learning Encoder-N-Decoder Framework for Movie and Video Description Nina, Oliver A., Nina 11 October 2018 (has links) No description available. Computer Science Computer Engineering Electrical Engineering
22	Can I open it? : Robot Affordance Inference using a Probabilistic Reasoning Approach Aguirregomezcorta Aina, Jorge January 2024 (has links) Modern autonomous systems should be able to interact with their surroundings in a flexible yet safe manner. To guarantee this behavior, such systems must learn how to approach unseen entities in their environment through the inference of relationships between actions and objects, called affordances. This research project introduces a neuro-symbolic AI system capable of inferring affordances using attribute detection and knowledge representation as its core principles. The attribute detection module employs a visuo-lingual image captioning model to extract the key object attributes of a scene, while the cognitive knowledge module infers the affordances of those attributes using conditional probability. The practical capabilities of the neuro-symbolic AI system are assessed by implementing a simulated robot system that interacts within the problem space of jars and bottles. The neuro-symbolic AI system is evaluated through its caption-inferring capabilities using image captioning and machine translation metrics. The scores registered in the evaluation show a successful attribute captioning rate of more than 71%. The robot simulation is evaluated within a Unity virtual environment by interacting with 50 jars and bottles, equally divided between lifting and twisting affordances. The robot system successfully interacts with all the objects in the scene due to the robustness of the architecture but fails in the inference process 24 out of the 50 iterations. Contrary to previous works approaching the problem as a classification task, this study shows that affordance inference can be successfully implemented using a cognitive visuo-lingual method. The study’s results justify further study into the use of neuro-symbolic AI approaches to affordance inference. Neuro-symbolic AI Knowledge Representation Image Captioning Affordances Simulated Robotics
23	Popis fotografií pomocí rekurentních neuronových sítí / Image Captioning with Recurrent Neural Networks Kvita, Jakub January 2016 (has links) Tato práce se zabývá automatickým generovaním popisů obrázků s využitím několika druhů neuronových sítí. Práce je založena na článcích z MS COCO Captioning Challenge 2015 a znakových jazykových modelech, popularizovaných A. Karpathym. Navržený model je kombinací konvoluční a rekurentní neuronové sítě s architekturou kodér--dekodér. Vektor reprezentující zakódovaný obrázek je předáván jazykovému modelu jako hodnoty paměti LSTM vrstev v síti. Práce zkoumá, na jaké úrovni je model s takto jednoduchou architekturou schopen popisovat obrázky a jak si stojí v porovnání s ostatními současnými modely. Jedním ze závěrů práce je, že navržená architektura není dostatečná pro jakýkoli popis obrázků.
24	Learning Embeddings for Fashion Images Hermansson, Simon January 2023 (has links) Today the process of sorting second-hand clothes and textiles is mostly manual. In this master’s thesis, methods for automating this process as well as improving the manual sorting process have been investigated. The methods explored include the automatic prediction of price and intended usage for second-hand clothes, as well as different types of image retrieval to aid manual sorting. Two models were examined: CLIP, a multi-modal model, and MAE, a self-supervised model. Quantitatively, the results favored CLIP, which outperformed MAE in both image retrieval and prediction. However, MAE may still be useful for some applications in terms of image retrieval as it returns items that look similar, even if they do not necessarily have the same attributes. In contrast, CLIP is better at accurately retrieving garments with as many matching attributes as possible. For price prediction, the best model was CLIP. When fine-tuned on the dataset used, CLIP achieved an F1-Score of 38.08 using three different price categories in the dataset. For predicting the intended usage (either reusing the garment or exporting it to another country) the best model managed to achieve an F1-Score of 59.04. Computer Vision Machine Learning Image Retrieval CLIP Masked Autoencoders (MAE) Vision Transformers Image Captioning Price Prediction AI for Fashion
25	Parameter-efficient modeling and robust automatic evaluation of image captioning Ahmadi, Saba 10 1900 (has links) Le sous-titrage d’images est la tâche de l’intelligence artificielle (IA) qui consiste à décrire des images en langage naturel. Cette tâche d’IA a plusieurs applications sociétales utiles, telles que l’accessibilité pour les malvoyants, la génération automatisée de contenu, l’interaction humain-robot et l’analyse d’imagerie médicale. Au cours des huit dernières années, la recherche sur le sous-titrage d'images a connu d'énormes progrès dans la création de modèles solides, la collecte d'ensembles de données à grande échelle ainsi que le développement de mesures d'évaluation automatique. Malgré ces progrès remarquables, la recherche sur le sous-titrage d'images est confrontée à deux défis majeurs: 1) Comment construire des modèles efficaces en termes de paramètres, et 2) Comment construire des métriques d'évaluation automatique robustes. Dans cette thèse, nous apportons notre contribution à la résolution de chacun de ces défis. Premièrement, nous proposons une méthode efficace en termes de paramètres (MAPL \cite{mapl}) qui adapte des modèles pré-entraînés unimodaux de vision uniquement et de langage uniquement pour la tâche multimodale de sous-titrage d'images. MAPL apprend un mappage léger entre les espaces de représentation des modèles unimodaux. Ainsi, MAPL peut exploiter les fortes capacités de généralisation des modèles unimodaux pré-entraînés pour des tâches multimodales telles que le sous-titrage d'images. Deuxièmement, nous présentons une étude systématique de la robustesse des mesures d’évaluation des sous-titres d’images récemment proposées. Même si ces métriques correspondent bien aux jugements humains, nous avons constaté qu'elles ne sont pas robustes pour identifier les erreurs fines dans les légendes générées par le modèle. Il faut donc faire preuve de prudence lors de l'utilisation de ces métriques pour l'évaluation des sous-titres d'images. Nous espérons que nos résultats guideront de nouvelles améliorations dans l’évaluation automatique du sous-titrage d’images. / Image captioning is the artificial intelligence (AI) task of describing images in natural language. This AI task has several useful societal applications, such as accessibility for the visually impaired, automated content generation, human-robot interaction, and medical imaging analysis. Over the last eight years, image captioning research has seen tremendous progress in building strong models, collecting large scale datasets as well as developing automatic evaluation metrics. Despite such remarkable progress, image captioning research faces two major challenges: 1) How to build parameter-efficient models, and 2) How to build robust automatic evaluation metrics. In this thesis, we make contributions towards tackling each of these challenges. First, we propose a parameter efficient method (MAPL \cite{mapl}) that adapts pre-trained unimodal vision-only and language-only models for the multimodal task of image captioning. MAPL learns a lightweight mapping between the representation spaces of the unimodal models. Thus, MAPL can leverage the strong generalization capabilities of the pre-trained unimodal models for multimodal tasks such as image captioning. Second, we present a systematic study of the robustness of recently proposed image captioning evaluation metrics. Even though these metrics correlate well with human judgments, we found that these metrics are not robust in identifying fine-grained errors in model generated captions, and thus, caution needs to be exercised when using these metrics for image captioning evaluation. We hope our findings will guide further improvements in the automatic evaluation of image captioning. Parameter-efficient Image captioning Metrics Reference-free Paramètre-efficace Sous-titrage d’images Évaluation Métriques Sans référence
26	Deep Understanding of Technical Documents: Automated Generation of Pseudocode from Digital Diagrams & Analysis/Synthesis of Mathematical Formulas Gkorgkolis, Nikolaos January 2022 (has links) No description available. Artificial Intelligence Computer Science Technical Documents Pseudocode Mathematical Formulas Mathematical Expressions Machine Learning Metadata Artificial Intelligence Semantic Understanding Graph Classification Image Captioning
27	同時的な独話音声要約に基づくリアルタイム字幕生成大野, 誠寛, 松原, 茂樹, 柏岡, 秀紀, 稲垣, 康善 07 1900 (has links) (PDF) ここに掲載した著作物の利用に関する注意本著作物の著作権は（社）情報処理学会に帰属します。本著作物は著作権者である情報処理学会の許可のもとに掲載するものです。ご利用に当たっては「著作権法」ならびに「情報処理学会倫理綱領」に従うことをお願いいたします。 Notice for the use of this material The copyright of this material is retained by the Information Processing Society of Japan (IPSJ). This material is published on this web site with the agreement of the author (s) and the IPSJ. Please be complied with Copyright Law of Japan and the Code of Ethics of the IPSJ if any users wish to reproduce, make derivative work, distribute or make available to the public any part or whole thereof. All Rights Reserved, Copyright (C) Information Processing Society of Japan. Comments are welcome. Mail to address: 　editj<at>ipsj.or.jp, please. 字幕生成漸進的解析係り受け解析独話音声言語 captioning incremental parsing dependency parsing monologue spoken language
28	Learning visual representations with neural networks for video captioning and image generation Yao, Li 12 1900 (has links) No description available. neural networks representation learning video captioning unsupervised learning supervised learning visual representation réseaux de neurones apprentissage de représentations description naturelle de vidéos apprentissage supervisé apprentissage non-supervisé représentation visuelle
29	Medical image captioning based on Deep Architectures / Medicinsk bild textning baserad på Djupa arkitekturer Moschovis, Georgios January 2022 (has links) Diagnostic Captioning is described as “the automatic generation of a diagnostic text from a set of medical images of a patient collected during an examination” [59] and it can assist inexperienced doctors and radiologists to reduce clinical errors or help experienced professionals increase their productivity. In this context, tools that would help medical doctors produce higher quality reports in less time could be of high interest for medical imaging departments, as well as significantly impact deep learning research within the biomedical domain, which makes it particularly interesting for people involved in industry and researchers all along. In this work, we attempted to develop Diagnostic Captioning systems, based on novel Deep Learning approaches, to investigate to what extent Neural Networks are capable of performing medical image tagging, as well as automatically generating a diagnostic text from a set of medical images. Towards this objective, the first step is concept detection, which boils down to predicting the relevant tags for X-RAY images, whereas the ultimate goal is caption generation. To this end, we further participated in ImageCLEFmedical 2022 evaluation campaign, addressing both the concept detection and the caption prediction tasks by developing baselines based on Deep Neural Networks; including image encoders, classifiers and text generators; in order to get a quantitative measure of my proposed architectures’ performance [28]. My contribution to the evaluation campaign, as part of this work and on behalf of NeuralDynamicsLab¹ group at KTH Royal Institute of Technology, within the school of Electrical Engineering and Computer Science, ranked 4th in the former and 5th in the latter task [55, 68] among 12 groups included within the top-10 best performing submissions in both tasks. / Diagnostisk textning avser automatisk generering från en diagnostisk text från en uppsättning medicinska bilder av en patient som samlats in under en undersökning och den kan hjälpa oerfarna läkare och radiologer, minska kliniska fel eller hjälpa erfarna yrkesmän att producera diagnostiska rapporter snabbare [59]. Därför kan verktyg som skulle hjälpa läkare och radiologer att producera rapporter av högre kvalitet på kortare tid vara av stort intresse för medicinska bildbehandlingsavdelningar, såväl som leda till inverkan på forskning om djupinlärning, vilket gör den domänen särskilt intressant för personer som är involverade i den biomedicinska industrin och djupinlärningsforskare. I detta arbete var mitt huvudmål att utveckla system för diagnostisk textning, med hjälp av nya tillvägagångssätt som används inom djupinlärning, för att undersöka i vilken utsträckning automatisk generering av en diagnostisk text från en uppsättning medi-cinska bilder är möjlig. Mot detta mål är det första steget konceptdetektering som går ut på att förutsäga relevanta taggar för röntgenbilder, medan slutmålet är bildtextgenerering. Jag deltog i ImageCLEF Medical 2022-utvärderingskampanjen, där jag deltog med att ta itu med både konceptdetektering och bildtextförutsägelse för att få ett kvantitativt mått på prestandan för mina föreslagna arkitekturer [28]. Mitt bidrag, där jag representerade forskargruppen NeuralDynamicsLab² , där jag arbetade som ledande forskningsingenjör, placerade sig på 4:e plats i den förra och 5:e i den senare uppgiften [55, 68] bland 12 grupper som ingår bland de 10 bästa bidragen i båda uppgifterna. Artificial Neural Networks Deep Learning Speech and language technology Natural Language Processing (NLP) Deep networks Generative deep networks Convolutional neural networks (CNN) Text generation Information retrieval Diagnostic captioning Image captioning concept prediction classification image encoders transformers Encoder-Decoder architecture abstractive summarization Neurala nätverk Djup inlärning Tal-och språkteknologi naturlig språkbehandling djup neurala nätverk generativa djupa nätverk konvolutionella neurala nätverk Textgenerering Informationssökning Diagnostisk textning Bildtextning konceptförutsägelse klassificering bildkodare transformatorer kodaravkodararkitektur abstrakt sammanfattning Computer and Information Sciences Data- och informationsvetenskap
30	數位電視平台與弱勢團體媒體近用：以公共電視台服務聽障社群為例 / Digital TV platform and the right of media access of underprivileged group: Take PTS service for hearing impaired community as example 陳慧汶 Unknown Date (has links) 邁入數位電視紀元乃是全球之趨，而其對於增進身障者獲取各類資訊的「媒介近用權」具有莫大助益，其中針對聽障社群接取內容最重要的近用需求──「字幕」和「手語」服務，在數位科技匯流發展下，皆可以「隱藏式」之方式供應，同時造福聽障和非聽障之傳播權益，以及減輕廣電業者相關技術的支付成本。因此，近用服務的提供從過去的消極被動轉向現今的積極樂觀。而外國先進國家大多皆以公共廣電媒體之設立價值與目標，作為該國近用服務推動的核心主體，希望藉由數位電視的技術研發，達成更多聽障輔助應用之需求和供應滿足，協助其順利進入數位包容社會。故本研究以探詢國外落實近用服務情形，以做為我國公共廣電服務借力使力之參考，期許對我國聽障社群在傳播權益上產生影響。　　　　研究發現，英國、歐盟針對聽障社群的媒體近用落實，無論在法規的制定、實務的推行以及技術的研發等各層面皆有所重視，認為數位電視平台的時代，應協助聽障融入數位包容社會，並設法增進其傳播權益，以彰顯聽障與一般大眾之平權的公民地位；而在我國公視部份，其營運目標始終視英國BBC為效法對象，希冀在內、外資源充份下能達至同BBC供應近用服務之標準水平。然而在多種因素交織下，現階段公視對於聽障媒體近用服務的提供，則依舊保持類比電視時代之作為，不過，經本研究與其互動後了解，公視未來可能朝向增加其他近用服務項目發展，期望數位電視真正來臨時，其能化過往被動態度轉向積極進取：公視目前在電視平台持續兩個「手語專門」節目的製播，並預計規劃將手語服務擴大至「運動」類型節目，以符合聽障收視的期待；至於字幕服務，在已完備的基礎上，試圖朝向「表情字幕」與「即時字幕」發展；另外，於2011年HiHD數位頻道將推出「隱藏式字幕」功能。在網路平台方面，公視服務仍然延伸至電視頻道的節目宣傳與相關資訊供給為主，對於加強聽障的網路近用權益，例如「無障礙網頁空間」以及「近用小組」，認為必然有公共義務介入加以落實，但礙於目前並無相關資源規劃與投入，因此要實際推行仍有很大的進步空間。 / The main purpose of this study is to discover the practice of the right of media access in foreign countries, in order to provide reference to Taiwan’s Public Service Broadcasting (PSB) and to make progress on communication interests for hearing impaired community. “Caption” and “Sign Language” are the most important tools for hearing impaired people to gather all kinds of information and fulfill the necessity of access service. Under the digital convergence, these tools can be provided in special ways, which makes the hearing impaired people and the hearing people share the benefits simultaneously and the cost-down effect of broadcasting industry. We know that most developed countries positioned their access service project by referring to nation’s PSB. They believed the new era of digital TV is a solution to attend the balance between demand and supply of hearing impaired aid applications. While the provision of access services is getting more active and optimistic, the digital inclusion is much close to us. The study shows, British and Europe Union think they should assist hearing impaired people to be involved in e-Inclusion society and highlight equally citizen status by enhancing the rights of hearing impaired people. All the aspects such as regulation enactments, practical implementations and technique developments has been considering all the time on the stage of digital TV platform. Just like the BBC in British area, Public Television Service (PTS) in Taiwan is taking BBC as a benchmark to achieve the access services standard in condition of sufficient resources. However, changing the status quo is not so easy for inextricably interwoven reasons. PTS still works in an analog status. In spite of the circumstances haven't changed much till now, there are much more possibilities in the future. The study discovered some new progressive plans are possible for PTS’s access services in digital journey: PTS will continue to provide two programs which are sign-presented, and moreover, sign language service is going to show up in sports genre; As to caption services, PTS is working on facial expression caption and real-time caption provision; HiHD would have closed caption function in 2011. In the case of Internet platform, PTS is focused on propaganda and related information of TV programs. Barrier-free web space and access group are considered necessary for strengthening hearing impaired people’s Internet access rights and interests, but with insufficient resources planning and investment to put into realization. We can see there is still so much to do if we believe we have the affirmative obligations. 公共廣電服務媒體近用權數位電視平台隱藏式字幕隱藏式手語聽障社群 Public Service Broadcasting the right of media access digital TV platform closed captioning closed signing hearing impaired community

Search results