The semantic annotation of large multimedia corpora is essential for numerous tasks. Be it for the training of classification algorithms, efficient content retrieval, or for analytical reasoning, appropriate labels are often the first necessity before automatic processing becomes efficient. However, manual labeling of large datasets is time-consuming and tedious. Hence, we present a new visual approach for labeling and retrieval of reports in multimedia news corpora. It combines automatic classifier training based on caption text from news reports with human interpretation to ease the annotation process. In our approach, users can initialize labels with keyword queries and iteratively annotate examples to train a classifier. The proposed visualization displays representative results in an overview that allows to follow different annotation strategies (e.g., active learning) and assess the quality of the classifier. Based on a usage scenario, we demonstrate the successful application of our approach. Therein, users label several topics which interest them and retrieve related documents with high confidence from three years of news reports.
Identifer | oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:32804 |
Date | 25 January 2019 |
Creators | Han, Qi, John, Markus, Kurzhals, Kuno, Messner, Johannes, Ertl, Thomas |
Source Sets | Hochschulschriftenserver (HSSS) der SLUB Dresden |
Language | English |
Detected Language | English |
Type | doc-type:conferenceObject, info:eu-repo/semantics/conferenceObject, doc-type:Text |
Rights | info:eu-repo/semantics/openAccess |
Relation | urn:nbn:de:bsz:15-qucosa2-327974, qucosa:32797 |
Page generated in 0.0146 seconds