Global ETD Search

531	Optimierung von Algorithmen zur Videoanalyse: Ein Analyseframework für die Anforderungen lokaler Fernsehsender Ritter, Marc 02 February 2015 (has links) Die Datenbestände lokaler Fernsehsender umfassen oftmals mehrere zehntausend Videokassetten. Moderne Verfahren werden benötigt, um derartige Datenkollektionen inhaltlich automatisiert zu erschließen. Das Auffinden relevanter Objekte spielt dabei eine übergeordnete Rolle, wobei gesteigerte Anforderungen wie niedrige Fehler- und hohe Detektionsraten notwendig sind, um eine Korruption des Suchindex zu verhindern und erfolgreiche Recherchen zu ermöglichen. Zugleich müssen genügend Objekte indiziert werden, um Aussagen über den tatsächlichen Inhalt zu treffen. Diese Arbeit befasst sich mit der Anpassung und Optimierung bestehender Detektionsverfahren. Dazu wird ein auf die hohen Leistungsbedürfnisse der Videoanalyse zugeschnittenes holistisches Workflow- und Prozesssystem mit der Zielstellung implementiert, die Entwicklung von Bilderkennungsalgorithmen, die Visualisierung von Zwischenschritten sowie deren Evaluation zu ermöglichen. Im Fokus stehen Verfahren zur strukturellen Zerlegung von Videomaterialien und zur inhaltlichen Analyse im Bereich der Gesichtsdetektion und Fußgängererkennung.:1. Motivation . . . 1 1.1. Einordnung in den Retrievalprozess . . . . . . . . . . . . . . . . . . . 2 1.2. Infrastruktur zur Optimierung von Verfahren zur Videoanalyse . . . . 4 1.3. Herausforderungen der Bilderkennung . . . . . . . . . . . . . . . . . . 6 1.4. Wissenschaftliche Ergebnisse dieser Arbeit . . . . . . . . . . . . . . . 9 1.5. Kapitelübersicht . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2. Methoden und Strategien der Videoanalyse . . . 15 2.1. Fachgebiete der Bilderkennung . . . . . . . . . . . . . . . . . . . . . . 16 2.1.1. Maschinelles Lernen . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.2. Maschinelles Sehen . . . . . . . . . . . . . . . . . . . . . . . . 18 2.1.3. Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.4. Mustererkennung . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2. Strukturelle Analyse von generischen Mustererkennungsystemen . . . 22 2.2.1. Datenakquisition . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2.2. Musteranalyse . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2.3. Musterklassifizierung . . . . . . . . . . . . . . . . . . . . . . . 26 2.2.4. Bilderkennungssysteme . . . . . . . . . . . . . . . . . . . . . . 28 2.2.5. Wissensentdeckung in Datenbanken . . . . . . . . . . . . . . . 28 2.3. Bilderkennung in der inhaltsbasierten Bildsuche . . . . . . . . . . . . 29 2.3.1. Paradigmen . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.3.2. Bildsignaturen . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.3.3. Signaturtypen . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.3.4. Lerntechniken . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.4. Holistische Bilderkennungssysteme im Überblick . . . . . . . . . . . . 44 2.4.1. Ein segment- und konturbasiertes CBIR-System . . . . . . . . 45 2.4.2. Biologisch inspirierte Systeme . . . . . . . . . . . . . . . . . . 48 2.4.3. Lernen aus wenigen Beispielen . . . . . . . . . . . . . . . . . . 51 2.5. Objekterkennung im Szenenkontext . . . . . . . . . . . . . . . . . . . 55 2.6. Aktuelle Grenzen der Muster- und Objekterkennung . . . . . . . . . . 60 2.7. Konzept eines generischen Workflows zur Objekterkennung in Videos . . . 64 2.7.1. Strukturelle Analyse . . . . . . . . . . . . . . . . . . . . . . . 64 2.7.2. Inhaltliche Analyse . . . . . . . . . . . . . . . . . . . . . . . . 66 2.7.3. Erweiterung des klassischen Paradigmas zur Objekterkennung . . . 67 2.7.4. Anwendungsdomänen . . . . . . . . . . . . . . . . . . . . . . . 68 2.8. Fazit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3. Systemarchitektur zur Optimierung von Bilderkennungsverfahren . . . 71 3.1. Vorüberlegungen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.1.1. Softwaretechnische Anforderungen . . . . . . . . . . . . . . . . 72 3.1.2. Bewertung der Systemleistung . . . . . . . . . . . . . . . . . . 75 3.1.3. Ein- und Ausgabe . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.1.4. Modellierung von Domänenwissen . . . . . . . . . . . . . . . . 90 3.1.5. Diskriminierbarkeit von Merkmalen . . . . . . . . . . . . . . . 92 3.1.6. Zusammenfassende Darstellung . . . . . . . . . . . . . . . . . 95 3.2. Architektur des Gesamtsystems . . . . . . . . . . . . . . . . . . . . . 95 3.3. Struktureller Aufbau von AMOPA . . . . . . . . . . . . . . . . . . . 97 3.3.1. Verwendung von Prozessketten . . . . . . . . . . . . . . . . . 101 3.3.2. Bild- und Videoverarbeitung . . . . . . . . . . . . . . . . . . . 106 3.4. Annotation von Bildern und Videos . . . . . . . . . . . . . . . . . . . 107 3.4.1. Ein Annotationswerkzeug für Videos . . . . . . . . . . . . . . 108 3.4.2. Ein Ansatz zu Annotation, Klassifikation und Evaluation . . . 111 3.5. Fazit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 4. Videosegmentierung . . . 119 4.1. Schnitterkennung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 4.1.1. Struktureller Aufbau von Videos . . . . . . . . . . . . . . . . 121 4.1.2. Klassische Verfahren . . . . . . . . . . . . . . . . . . . . . . . 124 4.1.3. TRECVid: Evaluationskampagne und Datensätze . . . . . . . 125 4.1.4. Das Verfahren von AT&T . . . . . . . . . . . . . . . . . . . . 130 4.2. Schnittkomposition und Ähnlichkeit . . . . . . . . . . . . . . . . . . . 137 4.2.1. Dominant-Color-Deskriptor . . . . . . . . . . . . . . . . . . . 140 4.2.2. Color-Layout-Deskriptor . . . . . . . . . . . . . . . . . . . . . 140 4.2.3. Scalable-Color-Deskriptor . . . . . . . . . . . . . . . . . . . . 141 4.2.4. Edge-Histogram-Deskriptor . . . . . . . . . . . . . . . . . . . 142 4.3. Konzeption und Implementierung . . . . . . . . . . . . . . . . . . . . 143 4.3.1. Einbindung in das Prozesskonzept von AMOPA . . . . . . . . 144 4.3.2. Auswahl des Farbraums . . . . . . . . . . . . . . . . . . . . . 148 4.3.3. Bewegungsanalyse . . . . . . . . . . . . . . . . . . . . . . . . 151 4.3.4. Bestimmung und Verifikation von Schnittkandidaten . . . . . 159 4.3.5. Ergebnisdarstellung und -speicherung . . . . . . . . . . . . . . 171 4.4. Evaluation und Optimierung der harten Schnitterkennung . . . . . . 173 4.4.1. Die TRECVid Evaluationsmethodologie . . . . . . . . . . . . 174 4.4.2. Optimierung von Recall und Laufzeit . . . . . . . . . . . . . . 176 4.4.3. Optimierung der Precision . . . . . . . . . . . . . . . . . . . . 181 4.4.4. Validierung der Ergebnisse . . . . . . . . . . . . . . . . . . . . 183 4.5. Fazit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 5. Gesichtsdetektion . . . 187 5.1. Stand der Technik . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 5.1.1. Verfahrensklassen und Datensätze . . . . . . . . . . . . . . . . 189 5.1.2. Boosting-Verfahren . . . . . . . . . . . . . . . . . . . . . . . . 192 5.2. Realisierung eines Systems zur Musterklassifizierung . . . . . . . . . . 200 5.2.1. Trainingsphase . . . . . . . . . . . . . . . . . . . . . . . . . . 201 5.2.2. Klassifikation mit Hilfe von Detektorketten . . . . . . . . . . . 203 5.2.3. Erlernen eines geboosteten Gesichtsklassifikators . . . . . . . . 206 5.2.4. Exkurs: Gesichtslokalisation mittels Schwarmintelligenz . . . . 210 5.3. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 5.3.1. Datensatz TS100 . . . . . . . . . . . . . . . . . . . . . . . . . 214 5.3.2. Annotation von Gesichtern in unbeschränkten Domänen . . . 217 5.3.3. Evaluationsmethodik und Ergebnisdiskussion . . . . . . . . . . 218 5.4. Fazit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 6. Erkennung weiterer Objektklassen am Beispiel von Personen . . . 229 6.1. Merkmale für die Personenerkennung . . . . . . . . . . . . . . . . . . 230 6.2. Datensätze . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 6.3. Evaluation von Merkmalen auf verschiedenen Datensätzen . . . . . . 234 6.3.1. Evaluationsmethodik . . . . . . . . . . . . . . . . . . . . . . . 235 6.3.2. Auswertung und Ergebnisdiskussion . . . . . . . . . . . . . . . 238 6.4. Evaluation eines kaskadierten Klassifikationssystems . . . . . . . . . . 242 6.4.1. Systemarchitektur und Training . . . . . . . . . . . . . . . . . 242 6.4.2. Klassifikation und Evaluation . . . . . . . . . . . . . . . . . . 244 6.5. Fazit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 7. Zusammenfassung und Ausblick . . . 251 Anhang . . . 257 A. Übersicht zu den Experimenten zur Schnitterkennung . . . . . . . . . 259 A.1. Konfiguration und Laufzeiten der Experimente . . . . . . . . . 259 A.2. Stufe I: Farbraum und Bewegungsschätzung . . . . . . . . . . 261 A.3. Stufe II: Optimierung der Precision . . . . . . . . . . . . . . . 261 A.4. Echtzeitfähige Datenvisualisierung . . . . . . . . . . . . . . . . 267 A.5. Visualisierung einzelner Komponenten an Beispielen . . . . . . 269 B. Ergänzungen zu den Experimenten zur Gesichtsdetektion . . . . . . . 273 B.1. Trainingsverlauf des Klassifikators TUC FD . . . . . . . . . . 273 B.2. Übersicht zu den Mindestdetektionsgrößen auf TS100 . . . . . 273 B.3. Visualisierung der Detektionen auf TS100 . . . . . . . . . . . 279 C. Systemkonfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Verzeichnis der Abkürzungen und Begriffe . . . v Literaturverzeichnis . . . vii / The data collections of local television stations often consist of multiples of ten thousand video tapes. Modern methods are needed to exploit the content of such archives. While the retrieval of objects plays a fundamental role, essential requirements incorporate low false and high detection rates in order to prevent the corruption of the search index. However, a sufficient number of objects need to be found to make assumptions about the content explored. This work focuses on the adjustment and optimization of existing detection techniques. Therefor, the author develops a holistic framework that directly reflects on the high demands of video analysis with the aim to facilitate the development of image processing algorithms, the visualization of intermediate results, and their evaluation and optimization. The effectiveness of the system is demonstrated on the structural decomposition of video footage and on content-based detection of faces and pedestrians.:1. Motivation . . . 1 1.1. Einordnung in den Retrievalprozess . . . . . . . . . . . . . . . . . . . 2 1.2. Infrastruktur zur Optimierung von Verfahren zur Videoanalyse . . . . 4 1.3. Herausforderungen der Bilderkennung . . . . . . . . . . . . . . . . . . 6 1.4. Wissenschaftliche Ergebnisse dieser Arbeit . . . . . . . . . . . . . . . 9 1.5. Kapitelübersicht . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2. Methoden und Strategien der Videoanalyse . . . 15 2.1. Fachgebiete der Bilderkennung . . . . . . . . . . . . . . . . . . . . . . 16 2.1.1. Maschinelles Lernen . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.2. Maschinelles Sehen . . . . . . . . . . . . . . . . . . . . . . . . 18 2.1.3. Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.4. Mustererkennung . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2. Strukturelle Analyse von generischen Mustererkennungsystemen . . . 22 2.2.1. Datenakquisition . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2.2. Musteranalyse . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2.3. Musterklassifizierung . . . . . . . . . . . . . . . . . . . . . . . 26 2.2.4. Bilderkennungssysteme . . . . . . . . . . . . . . . . . . . . . . 28 2.2.5. Wissensentdeckung in Datenbanken . . . . . . . . . . . . . . . 28 2.3. Bilderkennung in der inhaltsbasierten Bildsuche . . . . . . . . . . . . 29 2.3.1. Paradigmen . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.3.2. Bildsignaturen . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.3.3. Signaturtypen . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.3.4. Lerntechniken . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.4. Holistische Bilderkennungssysteme im Überblick . . . . . . . . . . . . 44 2.4.1. Ein segment- und konturbasiertes CBIR-System . . . . . . . . 45 2.4.2. Biologisch inspirierte Systeme . . . . . . . . . . . . . . . . . . 48 2.4.3. Lernen aus wenigen Beispielen . . . . . . . . . . . . . . . . . . 51 2.5. Objekterkennung im Szenenkontext . . . . . . . . . . . . . . . . . . . 55 2.6. Aktuelle Grenzen der Muster- und Objekterkennung . . . . . . . . . . 60 2.7. Konzept eines generischen Workflows zur Objekterkennung in Videos . . . 64 2.7.1. Strukturelle Analyse . . . . . . . . . . . . . . . . . . . . . . . 64 2.7.2. Inhaltliche Analyse . . . . . . . . . . . . . . . . . . . . . . . . 66 2.7.3. Erweiterung des klassischen Paradigmas zur Objekterkennung . . . 67 2.7.4. Anwendungsdomänen . . . . . . . . . . . . . . . . . . . . . . . 68 2.8. Fazit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3. Systemarchitektur zur Optimierung von Bilderkennungsverfahren . . . 71 3.1. Vorüberlegungen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.1.1. Softwaretechnische Anforderungen . . . . . . . . . . . . . . . . 72 3.1.2. Bewertung der Systemleistung . . . . . . . . . . . . . . . . . . 75 3.1.3. Ein- und Ausgabe . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.1.4. Modellierung von Domänenwissen . . . . . . . . . . . . . . . . 90 3.1.5. Diskriminierbarkeit von Merkmalen . . . . . . . . . . . . . . . 92 3.1.6. Zusammenfassende Darstellung . . . . . . . . . . . . . . . . . 95 3.2. Architektur des Gesamtsystems . . . . . . . . . . . . . . . . . . . . . 95 3.3. Struktureller Aufbau von AMOPA . . . . . . . . . . . . . . . . . . . 97 3.3.1. Verwendung von Prozessketten . . . . . . . . . . . . . . . . . 101 3.3.2. Bild- und Videoverarbeitung . . . . . . . . . . . . . . . . . . . 106 3.4. Annotation von Bildern und Videos . . . . . . . . . . . . . . . . . . . 107 3.4.1. Ein Annotationswerkzeug für Videos . . . . . . . . . . . . . . 108 3.4.2. Ein Ansatz zu Annotation, Klassifikation und Evaluation . . . 111 3.5. Fazit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 4. Videosegmentierung . . . 119 4.1. Schnitterkennung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 4.1.1. Struktureller Aufbau von Videos . . . . . . . . . . . . . . . . 121 4.1.2. Klassische Verfahren . . . . . . . . . . . . . . . . . . . . . . . 124 4.1.3. TRECVid: Evaluationskampagne und Datensätze . . . . . . . 125 4.1.4. Das Verfahren von AT&T . . . . . . . . . . . . . . . . . . . . 130 4.2. Schnittkomposition und Ähnlichkeit . . . . . . . . . . . . . . . . . . . 137 4.2.1. Dominant-Color-Deskriptor . . . . . . . . . . . . . . . . . . . 140 4.2.2. Color-Layout-Deskriptor . . . . . . . . . . . . . . . . . . . . . 140 4.2.3. Scalable-Color-Deskriptor . . . . . . . . . . . . . . . . . . . . 141 4.2.4. Edge-Histogram-Deskriptor . . . . . . . . . . . . . . . . . . . 142 4.3. Konzeption und Implementierung . . . . . . . . . . . . . . . . . . . . 143 4.3.1. Einbindung in das Prozesskonzept von AMOPA . . . . . . . . 144 4.3.2. Auswahl des Farbraums . . . . . . . . . . . . . . . . . . . . . 148 4.3.3. Bewegungsanalyse . . . . . . . . . . . . . . . . . . . . . . . . 151 4.3.4. Bestimmung und Verifikation von Schnittkandidaten . . . . . 159 4.3.5. Ergebnisdarstellung und -speicherung . . . . . . . . . . . . . . 171 4.4. Evaluation und Optimierung der harten Schnitterkennung . . . . . . 173 4.4.1. Die TRECVid Evaluationsmethodologie . . . . . . . . . . . . 174 4.4.2. Optimierung von Recall und Laufzeit . . . . . . . . . . . . . . 176 4.4.3. Optimierung der Precision . . . . . . . . . . . . . . . . . . . . 181 4.4.4. Validierung der Ergebnisse . . . . . . . . . . . . . . . . . . . . 183 4.5. Fazit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 5. Gesichtsdetektion . . . 187 5.1. Stand der Technik . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 5.1.1. Verfahrensklassen und Datensätze . . . . . . . . . . . . . . . . 189 5.1.2. Boosting-Verfahren . . . . . . . . . . . . . . . . . . . . . . . . 192 5.2. Realisierung eines Systems zur Musterklassifizierung . . . . . . . . . . 200 5.2.1. Trainingsphase . . . . . . . . . . . . . . . . . . . . . . . . . . 201 5.2.2. Klassifikation mit Hilfe von Detektorketten . . . . . . . . . . . 203 5.2.3. Erlernen eines geboosteten Gesichtsklassifikators . . . . . . . . 206 5.2.4. Exkurs: Gesichtslokalisation mittels Schwarmintelligenz . . . . 210 5.3. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 5.3.1. Datensatz TS100 . . . . . . . . . . . . . . . . . . . . . . . . . 214 5.3.2. Annotation von Gesichtern in unbeschränkten Domänen . . . 217 5.3.3. Evaluationsmethodik und Ergebnisdiskussion . . . . . . . . . . 218 5.4. Fazit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 6. Erkennung weiterer Objektklassen am Beispiel von Personen . . . 229 6.1. Merkmale für die Personenerkennung . . . . . . . . . . . . . . . . . . 230 6.2. Datensätze . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 6.3. Evaluation von Merkmalen auf verschiedenen Datensätzen . . . . . . 234 6.3.1. Evaluationsmethodik . . . . . . . . . . . . . . . . . . . . . . . 235 6.3.2. Auswertung und Ergebnisdiskussion . . . . . . . . . . . . . . . 238 6.4. Evaluation eines kaskadierten Klassifikationssystems . . . . . . . . . . 242 6.4.1. Systemarchitektur und Training . . . . . . . . . . . . . . . . . 242 6.4.2. Klassifikation und Evaluation . . . . . . . . . . . . . . . . . . 244 6.5. Fazit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 7. Zusammenfassung und Ausblick . . . 251 Anhang . . . 257 A. Übersicht zu den Experimenten zur Schnitterkennung . . . . . . . . . 259 A.1. Konfiguration und Laufzeiten der Experimente . . . . . . . . . 259 A.2. Stufe I: Farbraum und Bewegungsschätzung . . . . . . . . . . 261 A.3. Stufe II: Optimierung der Precision . . . . . . . . . . . . . . . 261 A.4. Echtzeitfähige Datenvisualisierung . . . . . . . . . . . . . . . . 267 A.5. Visualisierung einzelner Komponenten an Beispielen . . . . . . 269 B. Ergänzungen zu den Experimenten zur Gesichtsdetektion . . . . . . . 273 B.1. Trainingsverlauf des Klassifikators TUC FD . . . . . . . . . . 273 B.2. Übersicht zu den Mindestdetektionsgrößen auf TS100 . . . . . 273 B.3. Visualisierung der Detektionen auf TS100 . . . . . . . . . . . 279 C. Systemkonfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Verzeichnis der Abkürzungen und Begriffe . . . v Literaturverzeichnis . . . vii info:eu-repo/classification/ddc/000 ddc:000 info:eu-repo/classification/ddc/004 ddc:004 info:eu-repo/classification/ddc/005 ddc:005 info:eu-repo/classification/ddc/006 ddc:006 Maschinelles Lernen; Boosting
532	ENERGY EFFICIENT EDGE INFERENCE SYSTEMS Soumendu Kumar Ghosh (14060094) 07 August 2023 (has links) <p>Deep Learning (DL)-based edge intelligence has garnered significant attention in recent years due to the rapid proliferation of the Internet of Things (IoT), embedded, and intelligent systems, collectively termed edge devices. Sensor data streams acquired by these edge devices are processed by a Deep Neural Network (DNN) application that runs on the device itself or in the cloud. However, the high computational complexity and energy consumption of processing DNNs often limit their deployment on these edge inference systems due to limited compute, memory and energy resources. Furthermore, high costs, strict application latency demands, data privacy, security constraints, and the absence of reliable edge-cloud network connectivity heavily impact edge application efficiency in the case of cloud-assisted DNN inference. Inevitably, performance and energy efficiency are of utmost importance in these edge inference systems, aside from the accuracy of the application. To facilitate energy- efficient edge inference systems running computationally complex DNNs, this dissertation makes three key contributions.</p> <p><br></p> <p>The first contribution adopts a full-system approach to Approximate Computing, a design paradigm that trades off a small degradation in application quality for significant energy savings. Within this context, we present the foundational concepts of AxIS, the first approximate edge inference system that jointly optimizes the constituent subsystems leading to substantial energy benefits compared to optimization of the individual subsystem. To illustrate the efficacy of this approach, we demonstrate multiple versions of an approximate smart camera system that executes various DNN-based unimodal computer vision applications, showcasing how the sensor, memory, compute, and communication subsystems can all be synergistically approximated for energy-efficient edge inference.</p> <p><br></p> <p>Building on this foundation, the second contribution extends AxIS to multimodal AI, harnessing data from multiple sensor modalities to impart human-like cognitive and perceptual abilities to edge devices. By exploring optimization techniques for multiple sensor modalities and subsystems, this research reveals the impact of synergistic modality-aware optimizations on system-level accuracy-efficiency (AE) trade-offs, culminating in the introduction of SysteMMX, the first AE scalable cognitive system that allows efficient multimodal inference at the edge. To illustrate the practicality and effectiveness of this approach, we present an in-depth case study centered around a multimodal system that leverages RGB and Depth sensor modalities for image segmentation tasks.</p> <p><br></p> <p>The final contribution focuses on optimizing the performance of an edge-cloud collaborative inference system through intelligent DNN partitioning and computation offloading. We delve into the realm of distributed inference across edge devices and cloud servers, unveiling the challenges associated with finding the optimal partitioning point in DNNs for significant inference latency speedup. To address these challenges, we introduce PArtNNer, a platform-agnostic and adaptive DNN partitioning framework capable of dynamically adapting to changes in communication bandwidth and cloud server load. Unlike existing approaches, PArtNNer does not require pre-characterization of underlying edge computing platforms, making it a versatile and efficient solution for real-world edge-cloud scenarios.</p> <p><br></p> <p>Overall, this thesis provides novel insights, innovative techniques, and intelligent solutions to enable energy-efficient AI at the edge. The contributions presented herein serve as a solid foundation for future researchers to build upon, driving innovation and shaping the trajectory of research in edge AI.</p> Computer vision Energy-efficient computing Deep learning Edge AI deep learning at IoT edge collaborative AI Edge inference embedded systems (ES) deep neural networks (DNNs) Object detection and classification Approximate Computing Approximate Systems energy efficiency Accuracy - Efficiency trade-off Multimodal Deep Learning
533	Robust Subspace Estimation Using Low-rank Optimization. Theory And Applications In Scene Reconstruction, Video Denoising, And Activity Recognition. Oreifej, Omar 01 January 2013 (has links) In this dissertation, we discuss the problem of robust linear subspace estimation using low-rank optimization and propose three formulations of it. We demonstrate how these formulations can be used to solve fundamental computer vision problems, and provide superior performance in terms of accuracy and running time. Consider a set of observations extracted from images (such as pixel gray values, local features, trajectories . . . etc). If the assumption that these observations are drawn from a liner subspace (or can be linearly approximated) is valid, then the goal is to represent each observation as a linear combination of a compact basis, while maintaining a minimal reconstruction error. One of the earliest, yet most popular, approaches to achieve that is Principal Component Analysis (PCA). However, PCA can only handle Gaussian noise, and thus suffers when the observations are contaminated with gross and sparse outliers. To this end, in this dissertation, we focus on estimating the subspace robustly using low-rank optimization, where the sparse outliers are detected and separated through the `1 norm. The robust estimation has a two-fold advantage: First, the obtained basis better represents the actual subspace because it does not include contributions from the outliers. Second, the detected outliers are often of a specific interest in many applications, as we will show throughout this thesis. We demonstrate four different formulations and applications for low-rank optimization. First, we consider the problem of reconstructing an underwater sequence by removing the iii turbulence caused by the water waves. The main drawback of most previous attempts to tackle this problem is that they heavily depend on modelling the waves, which in fact is ill-posed since the actual behavior of the waves along with the imaging process are complicated and include several noise components; therefore, their results are not satisfactory. In contrast, we propose a novel approach which outperforms the state-of-the-art. The intuition behind our method is that in a sequence where the water is static, the frames would be linearly correlated. Therefore, in the presence of water waves, we may consider the frames as noisy observations drawn from a the subspace of linearly correlated frames. However, the noise introduced by the water waves is not sparse, and thus cannot directly be detected using low-rank optimization. Therefore, we propose a data-driven two-stage approach, where the first stage “sparsifies” the noise, and the second stage detects it. The first stage leverages the temporal mean of the sequence to overcome the structured turbulence of the waves through an iterative registration algorithm. The result of the first stage is a high quality mean and a better structured sequence; however, the sequence still contains unstructured sparse noise. Thus, we employ a second stage at which we extract the sparse errors from the sequence through rank minimization. Our method converges faster, and drastically outperforms state of the art on all testing sequences. Secondly, we consider a closely related situation where an independently moving object is also present in the turbulent video. More precisely, we consider video sequences acquired in a desert battlefields, where atmospheric turbulence is typically present, in addition to independently moving targets. Typical approaches for turbulence mitigation follow averaging or de-warping techniques. Although these methods can reduce the turbulence, they distort the independently moving objects which can often be of great interest. Therefore, we address the iv problem of simultaneous turbulence mitigation and moving object detection. We propose a novel three-term low-rank matrix decomposition approach in which we decompose the turbulence sequence into three components: the background, the turbulence, and the object. We simplify this extremely difficult problem into a minimization of nuclear norm, Frobenius norm, and `1 norm. Our method is based on two observations: First, the turbulence causes dense and Gaussian noise, and therefore can be captured by Frobenius norm, while the moving objects are sparse and thus can be captured by `1 norm. Second, since the object’s motion is linear and intrinsically different than the Gaussian-like turbulence, a Gaussian-based turbulence model can be employed to enforce an additional constraint on the search space of the minimization. We demonstrate the robustness of our approach on challenging sequences which are significantly distorted with atmospheric turbulence and include extremely tiny moving objects. In addition to robustly detecting the subspace of the frames of a sequence, we consider using trajectories as observations in the low-rank optimization framework. In particular, in videos acquired by moving cameras, we track all the pixels in the video and use that to estimate the camera motion subspace. This is particularly useful in activity recognition, which typically requires standard preprocessing steps such as motion compensation, moving object detection, and object tracking. The errors from the motion compensation step propagate to the object detection stage, resulting in miss-detections, which further complicates the tracking stage, resulting in cluttered and incorrect tracks. In contrast, we propose a novel approach which does not follow the standard steps, and accordingly avoids the aforementioned diffi- culties. Our approach is based on Lagrangian particle trajectories which are a set of dense trajectories obtained by advecting optical flow over time, thus capturing the ensemble motions v of a scene. This is done in frames of unaligned video, and no object detection is required. In order to handle the moving camera, we decompose the trajectories into their camera-induced and object-induced components. Having obtained the relevant object motion trajectories, we compute a compact set of chaotic invariant features, which captures the characteristics of the trajectories. Consequently, a SVM is employed to learn and recognize the human actions using the computed motion features. We performed intensive experiments on multiple benchmark datasets, and obtained promising results. Finally, we consider a more challenging problem referred to as complex event recognition, where the activities of interest are complex and unconstrained. This problem typically pose significant challenges because it involves videos of highly variable content, noise, length, frame size . . . etc. In this extremely challenging task, high-level features have recently shown a promising direction as in [53, 129], where core low-level events referred to as concepts are annotated and modelled using a portion of the training data, then each event is described using its content of these concepts. However, because of the complex nature of the videos, both the concept models and the corresponding high-level features are significantly noisy. In order to address this problem, we propose a novel low-rank formulation, which combines the precisely annotated videos used to train the concepts, with the rich high-level features. Our approach finds a new representation for each event, which is not only low-rank, but also constrained to adhere to the concept annotation, thus suppressing the noise, and maintaining a consistent occurrence of the concepts in each event. Extensive experiments on large scale real world dataset TRECVID Multimedia Event Detection 2011 and 2012 demonstrate that our approach consistently improves the discriminativity of the high-level features by a significant margin. low rank representation low rank sparse representation sparse activity recognition turbulence mitigation video denoising complex event recognition nuclear norm augmented lagrange multiplier camera motion estimation trecvid hoha water waves rank trajectories particle advection registration decomposition moving object detection background subtraction atmospheric turbulence Computer Engineering Engineering
534	Screw Hole Detection in Industrial Products using Neural Network based Object Detection and Image Segmentation : A Study Providing Ideas for Future Industrial Applications / Skruvhålsdetektering på Industriella Produkter med hjälp av Neurala Nätverksbaserade Objektdetektering och Bildsegmentering : En Studie som Erbjuder Ideér för Framtida Industriella Applikationer Melki, Jakob January 2022 (has links) This project is about screw hole detection using neural networks for automated assembly and disassembly. In a lot of industrial companies, such as Ericsson AB, there are products such as radio units or filters that have a lot of screw holes. Thus, the assembly and disassemble process is very time consuming and demanding for a human to assemble and disassemble the products. The problem statement in this project is to investigate the performance of neural networks within object detection and semantic segmentation to detect screw holes in industrial products. Different industrial models were created and synthetic data was generated in Blender. Two types of experiments were done, the first one compared an object detection algorithm (Faster R-CNN) with a semantic segmentation algorithm (SegNet) to see which area is most suitable for hole detection. The results showed that semantic segmentation outperforms object detection when it comes to detect multiple small holes. The second experiment was to further investigate about semantic segmentation algorithms by adding U-Net, PSPNet and LinkNet into the comparison. The networks U-Net and LinkNet were the most successful ones and achieved a Mean Intersection over Union (MIoU) of around 0.9, which shows that they have potential for further development. Thus, conclusions draw in this project are that segmentation algorithms are more suitable for hole detection than object detection algorithms. Furthermore, it shows that there is potential in neural networks within semantic segmentation to detect screw holes because of the results of U-Net and LinkNet. Future work that one can do is to create more advanced product models, investigate other segmentation networks and hyperparameter tuning. / Det här projektet handlar om skruvhålsdetektering genom att använda neurala nätverk för automatiserad montering och demontering. I många industriföretag, såsom Ericsson AB, finns det många produkter som radioenheter eller filter som har många skruvhål. Därmed, är monterings - och demonteringsprocessen väldigt tidsfördröjande och krävande för en människa att montera och demontera produkterna. Problemformuleringen i detta projekt är att undersöka prestationen av olika neurala nätverk inom objekt detektering och semantisk segmentering för skurvhålsdetektering på indutriella produkter. Olika indutriella modeller var skapade och syntetisk data var genererat i Blender. Två typer av experiment gjordes, den första jämförde en objekt detekterings algoritm (Faster R-CNN) med en semantisk segmenterigs algoritm för att vilket område som är mest lämplig för hål detektering. Resultaten visade att semantisk segmentering utpresterar objekt detektering när det kommer till att detektera flera små hål. Det andra experimentet handlade om att vidare undersöka semantiska segmenterings algoritmer genom att addera U-Net, PSPNet och LinkNet till jämförelsen. Nätverken U-Net och PSPNet var de mest framgångsrika och uppnåde en Mean Intersection over Union (MIoU) på cirka 0.9, vilket visar på att de har potential för vidare utveckling. Slutsatserna inom detta projekt är att semantisk segmentering är mer lämplig för hål detektering än objekt detektering. Dessutom, visade sig att det finns potential i neurala nätverk inom semantisk segmentering för att detejtera skruvhål på grund av resultaten av U-Net och LinkNet. Framtida arbete som man kan göra är att skapa flera avancerade produkt modeller, undersöka andra segmenterisk nätverk och hyperparameter tuning. Artificial intelligence (AI) Automated assembly and disassembly Computer vision Machine learning Neural networks Object detection Screw hole detection Semantic segmentation Artificiell intelligens (AI) Automatiserad montering och demontering Datorseende Maskininlärning Neurala nätverk Objekt detektering Skruvhålsdetektering Semantisk segmentering Elektroteknik och elektronik
535	Improving Deep Learning-based Object Detection Algorithms for Omnidirectional Images by Simulated Data Scheck, Tobias 08 August 2024 (has links) Perception, primarily through vision, is a vital human ability that informs decision-making and interactions with the world. Computer Vision, the field dedicated to emulating this human capability in computers, has witnessed transformative progress with the advent of artificial intelligence, particularly neural networks and deep learning. These technologies enable automatic feature learning, eliminating the need for laborious hand-crafted features. The increasing global demand for artificial intelligence applications across various industries, however, raises concerns about data privacy and access. This dissertation addresses these challenges by proposing solutions that leverage synthetic data to preserve privacy and enhance the robustness of computer vision algorithms. The primary objective of this dissertation is to reduce the dependence on real data for modern image processing algorithms by utilizing synthetic data generated through computer simulations. Synthetic data serves as a privacy-preserving alternative, enabling the generation of data in scenarios that are difficult or unsafe to replicate in the real world. While purely simulated data falls short of capturing the full complexity of reality, the dissertation explores methods to bridge the gap between synthetic and real data. The dissertation encompasses a comprehensive evaluation of the synthetic THEODORE dataset, focusing on object detection using Convolutional Neural Networks. Fine-tuning CNN architectures with synthetic data demonstrates remarkable performance improvements over relying solely on real-world data. Extending beyond person recognition, these architectures exhibit the ability to recognize various objects in real-world settings. This work also investigates real-time performance and the impact of barrel distortion in omnidirectional images, underlining the potential of using synthetic data. Furthermore, the dissertation introduces two unsupervised domain adaptation methods tailored for anchorless object detection within the CenterNet architecture. The methods effectively reduce the domain gap when synthetic omnidirectional images serve as the source domain, and real images act as the target domain. Qualitative assessments highlight the advantages of these methods in reducing noise and enhancing detection accuracy. The dissertation concludes with creating an application within the Ambient Assisted Living context to realize the concepts. This application encompasses indoor localization heatmaps, human pose estimation, and activity recognition. The methodology leverages synthetically generated data, unique object identifiers, and rotated bounding boxes to enhance tracking in omnidirectional images. Importantly, the system is designed to operate without compromising privacy or using sensitive images, aligning with the growing concerns of data privacy and access in artificial intelligence applications. / Die Wahrnehmung, insbesondere durch das Sehen, ist eine entscheidende menschliche Fähigkeit, die die Entscheidungsfindung und die Interaktion mit der Welt beeinflusst. Die Computer Vision, das Fachgebiet, das sich der Nachahmung dieser menschlichen Fähigkeit in Computern widmet, hat mit dem Aufkommen künstlicher Intelligenz, insbesondere neuronaler Netzwerke und tiefem Lernen, eine transformative Entwicklung erlebt. Diese Technologien ermöglichen das automatische Erlernen von Merkmalen und beseitigen die Notwendigkeit mühsamer, handgefertigter Merkmale. Die steigende weltweite Nachfrage nach Anwendungen künstlicher Intelligenz in verschiedenen Branchen wirft jedoch Bedenken hinsichtlich des Datenschutzes und des Datenzugriffs auf. Diese Dissertation begegnet diesen Herausforderungen, indem sie Lösungen vorschlägt, die auf synthetischen Daten basieren, um die Privatsphäre zu wahren und die Robustheit von Computer-Vision Algorithmen zu steigern. Das Hauptziel dieser Dissertation besteht darin, die Abhängigkeit von realen Daten für moderne Bildverarbeitungsalgorithmen durch die Verwendung von synthetischen Daten zu reduzieren, die durch Computersimulationen generiert werden. Synthetische Daten dienen als datenschutzfreundliche Alternative und ermöglichen die Generierung von Daten in Szenarien, die schwer oder unsicher in der realen Welt nachzustellen sind. Obwohl rein simulierte Daten die volle Komplexität der Realität nicht erfassen, erforscht die Dissertation Methoden zur Überbrückung der Kluft zwischen synthetischen und realen Daten. Die Dissertation umfasst eine Evaluation des synthetischen THEODORE-Datensatzes mit dem Schwerpunkt auf der Objekterkennung mithilfe von Convolutional Neural Networks. Das Feinabstimmen dieser Architekturen mit synthetischen Daten zeigt bemerkenswerte Leistungssteigerungen im Vergleich zur ausschließlichen Verwendung von realen Daten. Über die Erkennung von Personen hinaus zeigen diese Architekturen die Fähigkeit, verschiedene Objekte in realen Umgebungen zu erkennen. Untersucht wird auch die Echtzeit-Performance und der Einfluss der tonnenförmigen Verzerrung in omnidirektionalen Bildern und betont das Potenzial der Verwendung synthetischer Daten. Darüber hinaus führt die Dissertation zwei nicht überwachte Domänenanpassungsmethoden ein, die speziell für die ankerlose Objekterkennung in der CenterNetArchitektur entwickelt wurden. Die Methoden reduzieren effektiv die Domänenlücke, wenn synthetische omnidirektionale Bilder als Quelldomäne und reale Bilder als Zieldomäne dienen. Qualitative Bewertungen heben die Vorteile dieser Methoden bei der Reduzierung von Störungen und der Verbesserung der Erkennungsgenauigkeit hervor. Die Dissertation schließt mit der Entwicklung einer Anwendung im Kontext von Ambient Assisted Living zur Umsetzung der Konzepte. Diese Anwendung umfasst Innenlokalisierungskarten, die Schätzung der menschlichen Körperhaltung und die Erkennung von Aktivitäten. Die Methodologie nutzt synthetisch generierte Daten, eindeutige Objektidentifikatoren und rotierte Begrenzungsrahmen, um die Verfolgung in omnidirektionalen Bildern zu verbessern. Wichtig ist, dass das System entwickelt wurde, um ohne Beeinträchtigung der Privatsphäre oder Verwendung sensibler Bilder zu arbeiten, was den wachsenden Bedenken hinsichtlich des Datenschutzes und des Zugriffs auf Daten in Anwendungen künstlicher Intelligenz entspricht. info:eu-repo/classification/ddc/621.3 ddc:621.3
536	Uncertainty Estimation and Confidence Calibration in YOLO5Face Savinainen, Oskar January 2024 (has links) This thesis investigates predicting the Intersection over Union (IoU) in detections made by the face detector YOLO5Face, which is done to use the predicted IoU as a new uncertainty measure. The detections are done on the face dataset WIDER FACE, and the prediction of IoU is made by adding a parallel head to the existing YOLO5Face architecture. Experiments show that the methodology for predicting the IoU used in this thesis does not work and the parallel prediction head fails to predict the IoU and instead resorts to predicting common IoU values. The localisation confidence and classification confidences of YOLO5Face are then investigated to find out which confidence measure is least uncertain and most suitable to use when identifying faces. Experiments show that the localisation confidence is consistently more calibrated than the classification confidence. The classification confidence is then calibrated with respect to the localisation confidence which reduces the Expected Calibration Error (ECE) for classification confidence from 0.17 to 0.01. Computer Vision Machine Learning YOLO YOLOv5 YOLO5Face Uncertainty Estimation Confidence Calibration Face Detection Temperature Scaling Object Detection Neural Networks IoU Predicting IoU IoU prediction Computer Sciences Datavetenskap (datalogi) Computer Systems Datorsystem Signal Processing Signalbehandling
537	From Pixels to Predators: Wildlife Monitoring with Machine Learning / Från Pixlar till Rovdjur: Viltövervakning med Maskininlärning Eriksson, Max January 2024 (has links) This master’s thesis investigates the application of advanced machine learning models for the identification and classification of Swedish predators using camera trap images. With the growing threats to biodiversity, there is an urgent need for innovative and non-intrusive monitoring techniques. This study focuses on the development and evaluation of object detection models, including YOLOv5, YOLOv8, YOLOv9, and Faster R-CNN, aiming to enhance the surveillance capabilities of Swedish predatory species such as bears, wolves, lynxes, foxes, and wolverines. The research leverages a dataset from the NINA database, applying data preprocessing and augmentation techniques to ensure robust model training. The models were trained and evaluated using various dataset sizes and conditions, including day and night images. Notably, YOLOv8 and YOLOv9 underwent extended training for 300 epochs, leading to significant improvements in performance metrics. The performance of the models was evaluated using metrics such as mean Average Precision (mAP), precision, recall, and F1-score. YOLOv9, with its innovative Programmable Gradient Information (PGI) and GELAN architecture, demonstrated superior accuracy and reliability, achieving an F1-score of 0.98 on the expanded dataset. The research found that training models on images captured during both day and night jointly versus separately resulted in only minor differences in performance. However, models trained exclusively on daytime images showed slightly better performance due to more consistent and favorable lighting conditions. The study also revealed a positive correlation between the size of the training dataset and model performance, with larger datasets yielding better results across all metrics. However, the marginal gains decreased as the dataset size increased, suggesting diminishing returns. Among the species studied, foxes were the least challenging for the models to detect and identify, while wolves presented more significant challenges, likely due to their complex fur patterns and coloration blending with the background. Machine Learning Project Ngulia YOLO YOLOv9 YOLOv8 YOLOv5 Faster R-CNN Wildlife Monitoring Deep Learning Camera Traps Object Detection Image Processing Animal Detection Neural Networks Transfer Lernning Maskininlärning Viltövervakning Djupinlärning Kamerafällor Objektdetektion Neurala Nätverk Media and Communication Technology Medieteknik
538	Towards meaningful and data-efficient learning : exploring GAN losses, improving few-shot benchmarks, and multimodal video captioning Huang, Gabriel 09 1900 (has links) Ces dernières années, le domaine de l’apprentissage profond a connu des progrès énormes dans des applications allant de la génération d’images, détection d’objets, modélisation du langage à la réponse aux questions visuelles. Les approches classiques telles que l’apprentissage supervisé nécessitent de grandes quantités de données étiquetées et spécifiques à la tâches. Cependant, celles-ci sont parfois coûteuses, peu pratiques, ou trop longues à collecter. La modélisation efficace en données, qui comprend des techniques comme l’apprentissage few-shot (à partir de peu d’exemples) et l’apprentissage self-supervised (auto-supervisé), tentent de remédier au manque de données spécifiques à la tâche en exploitant de grandes quantités de données plus “générales”. Les progrès de l’apprentissage profond, et en particulier de l’apprentissage few-shot, s’appuient sur les benchmarks (suites d’évaluation), les métriques d’évaluation et les jeux de données, car ceux-ci sont utilisés pour tester et départager différentes méthodes sur des tâches précises, et identifier l’état de l’art. Cependant, du fait qu’il s’agit de versions idéalisées de la tâche à résoudre, les benchmarks sont rarement équivalents à la tâche originelle, et peuvent avoir plusieurs limitations qui entravent leur rôle de sélection des directions de recherche les plus prometteuses. De plus, la définition de métriques d’évaluation pertinentes peut être difficile, en particulier dans le cas de sorties structurées et en haute dimension, telles que des images, de l’audio, de la parole ou encore du texte. Cette thèse discute des limites et des perspectives des benchmarks existants, des fonctions de coût (training losses) et des métriques d’évaluation (evaluation metrics), en mettant l’accent sur la modélisation générative - les Réseaux Antagonistes Génératifs (GANs) en particulier - et la modélisation efficace des données, qui comprend l’apprentissage few-shot et self-supervised. La première contribution est une discussion de la tâche de modélisation générative, suivie d’une exploration des propriétés théoriques et empiriques des fonctions de coût des GANs. La deuxième contribution est une discussion sur la limitation des few-shot classification benchmarks, certains ne nécessitant pas de généralisation à de nouvelles sémantiques de classe pour être résolus, et la proposition d’une méthode de base pour les résoudre sans étiquettes en phase de testing. La troisième contribution est une revue sur les méthodes few-shot et self-supervised de détection d’objets , qui souligne les limites et directions de recherche prometteuses. Enfin, la quatrième contribution est une méthode efficace en données pour la description de vidéo qui exploite des jeux de données texte et vidéo non supervisés. / In recent years, the field of deep learning has seen tremendous progress for applications ranging from image generation, object detection, language modeling, to visual question answering. Classic approaches such as supervised learning require large amounts of task-specific and labeled data, which may be too expensive, time-consuming, or impractical to collect. Data-efficient methods, such as few-shot and self-supervised learning, attempt to deal with the limited availability of task-specific data by leveraging large amounts of general data. Progress in deep learning, and in particular, few-shot learning, is largely driven by the relevant benchmarks, evaluation metrics, and datasets. They are used to test and compare different methods on a given task, and determine the state-of-the-art. However, due to being idealized versions of the task to solve, benchmarks are rarely equivalent to the original task, and can have several limitations which hinder their role of identifying the most promising research directions. Moreover, defining meaningful evaluation metrics can be challenging, especially in the case of high-dimensional and structured outputs, such as images, audio, speech, or text. This thesis discusses the limitations and perspectives of existing benchmarks, training losses, and evaluation metrics, with a focus on generative modeling—Generative Adversarial Networks (GANs) in particular—and data-efficient modeling, which includes few-shot and self-supervised learning. The first contribution is a discussion of the generative modeling task, followed by an exploration of theoretical and empirical properties of the GAN loss. The second contribution is a discussion of a limitation of few-shot classification benchmarks, which is that they may not require class semantic generalization to be solved, and the proposal of a baseline method for solving them without test-time labels. The third contribution is a survey of few-shot and self-supervised object detection, which points out the limitations and promising future research for the field. Finally, the fourth contribution is a data-efficient method for video captioning, which leverages unsupervised text and video datasets, and explores several multimodal pretraining strategies. self-supervised learning few-shot classification few-shot object detection low-data learning object detection instance segmentation representation learning residual network visual transformer Faster R-CNN DETR parametric adversarial divergence generative adversarial network variational auto-encoder maximum-likelihood structured prediction optimal discriminator mutual information implicit generative model multimodal pretraining dense video captioning cross-attention YouCook2 HowTo-100M Youtube-8M Recipe-1M Pascal VOC MSCOCO LVIS mutual information neural estimation apprentissage auto-supervisé classification few-shot détection d'objets few-shot apprentissage efficace en données segmentation en instances apprentissage de représentation réseau résiduel transformer visual divergences antagonistes paramétriques auto-encodeur variationnel maximum de vraisemblance prédiction structurée discriminateur optimal information mutuelle modèle génératif implicite pré-apprentissage multi-modal description dense de vidéo attention croisée ResNet ViT GAN VAE MINE
539	Cars in Sweden's Cinema & Television : AI-Guided Research of Automobiles in Sweden’s Images from 1950-1980 Steck, Maximilian January 2021 (has links) This research project centers around cinematic and societal representation of the automobile in post-war Swedish cinema and television. Due to political neutrality during World War II, Sweden’s economy benefited from an extensive surplus immediately after Germany’s capitulation in 1945. Economic prosperity was in return transferred onto Swedish society, which enabled an already high degree of motorization of Swedes in mid-1950s, while neighboring European countries struggled rebuilding overall infrastructures, basic food supply lines and often entire cities. Naturally, this would conclude that Swedes presumably had a favorable attitude towards cars from the beginning, ultimately being reflected in some sort of cultural memory. However, Stig Dagerman’s 1948 short story “To Kill a Child” (Att döda ett barn), later on realized as short film in 1953, outlines a rather suspicious and cautious attitude towards automobiles. Cars’ mass-media portrayal in Swedish cinema and television was analyzed with current AI-techniques, therewith observing notable changes in imagery, themes and attitudes surrounding cars over 30 years in history. Filmarkivet.se served as main source with 114 currently available media artifacts from 1950 to 1980, including a wide spectrum of footage i.e., weekly newsreels, private filmmakers’ collections, television commercials, movie trailers, political campaigns and documentary formats. This source material proved diversified in nature as well as redrawing accurately representations of Swedish mass media of its time as it varied between cinema and television, whilst focusing in on daily life of individuals or daily life in Sweden’s cities. While artificial intelligence object recognition helped identifying pertinent sections within a large corpus of film data, subsequently, a qualitative tf-idf-analysis of selected films based on speech-to-text output was conducted, counterbalancing quantitative research approaches. archive AI cars cultural memory digital humanities film New Cinema History object detection speech recognition Swedish film oil crisis H-Day Archiv Künstliche Intelligenz KI Filmwissenschaft Medienwissenschaft Film Schweden Schwedischer Film Auto Repräsentation Wilde Erdbeeren arkiv artificiell intelligens svensk film bil minne representation oljekris dagen H Smultronstället 映画学スウェデンスウェデンの映画 AI 車歴史オイルショック野いちごダゲン・H Media Studies Medievetenskap Studies on Film Filmvetenskap
540	PRODUCT-APPLICATION FIT, CONCEPTUALIZATION, AND DESIGN OF TECHNOLOGIES: PROSTHETIC HAND TO MULTI-CORE VAPOR CHAMBERS Soumya Bandyopadhyay (13171827) 29 July 2022 (has links) <p>From idea generation to conceptualization and development of products and technologies is a non-linear and iterative process. The work in this thesis follows a process that initiates with the review of existing technologies and products, examining their unique value proposition in the context of the specific applications for which they are designed. Next, the unmet needs of novel or emerging applications are identified that require new product or technologies. Once these user needs and product requirements are identified, the specific functions to be addressed by the product are specified. The subsequent process of design of products and technologies to meet these functions is enabled by engineering tools such as three-dimensional modelling, physics-based simulations, and manufacturing of a minimum viable prototype. In these steps, un-biased decisions have to be taken using weighted decision matrices to cater to the design requirements. Finally, the minimum viable prototype is tested to demonstrate the principal functionalities. The results obtained from the testing process identify the potential future improvements in the next generations of the prototype that would subsequently inform the final design of product. This thesis adopted this methodology to initiate the design two product-prototypes: i) an image-recognition-integrated service (IRIS) robotic hand for children and ii) cascaded multi-core vapor chamber (CMVC) for improving performance of next-generation computing systems. Minimum viable product-prototypes were manufactured to demonstrate the principal functionalities, followed by clear identification of future potential improvements. Tests of the prosthetic hand indicate that the image-recognition based feedback can successfully drive the actuators to perform the intended grasping motions. Experimental testing with the multi-core vapor chamber demonstrates successful performance of the prototype, which offers notable reduction in temperatures relative to the existing benchmark solid copper spreader. </p> Analog electronics and interfaces CAD/CAM systems Biomechanics Modelling and simulation Image processing Computer graphics Human-computer interaction prosthetic hand sketch idea generation Rapid prototyping Design exploration IoT artificial intelligence object detection method vapor chamber Mechanical Engineering Human-Centered design Human-computer Interaction Simulation and Modeling Input, Output and Data Devices Computer Graphics Methodology Creativity and innovation Interdisciplinary Engineering

Search results