Return to search

Pattern-based segmentation of digital documents: model and implementation

This thesis proposes a new document model, according to which any document can be segmented in some independent components and transformed in a pattern-based projection, that only uses a very small set of objects and composition rules. The point is that such a normalized document expresses the same fundamental information of the original one, in a simple, clear and unambiguous way. The central part of my work consists of discussing that model, investigating how a digital document can be segmented, and how a segmented version can be used to implement advanced tools of conversion. I present seven patterns which are versatile enough to capture the most relevant documents’ structures, and whose minimality and rigour make that implementation possible. The abstract model is then instantiated into an actual markup language, called IML. IML is a general and extensible language, which basically adopts an XHTML syntax, able to capture a posteriori the only content of a digital document. It is compared with other languages and proposals, in order to clarify its role and
objectives. Finally, I present some systems built upon these ideas. These applications are evaluated in terms of users’ advantages, workflow improvements and impact over the overall quality of the output. In particular, they cover heterogeneous content management processes: from web editing to collaboration (IsaWiki and WikiFactory), from e-learning (IsaLearning) to professional printing (IsaPress).

Identiferoai:union.ndltd.org:unibo.it/oai:amsdottorato.cib.unibo.it:370
Date16 April 2007
Creatorsdi Iorio, Angelo <1977>
ContributorsCianciarini, Paolo
PublisherAlma Mater Studiorum - Università di Bologna
Source SetsUniversità di Bologna
LanguageEnglish
Detected LanguageEnglish
TypeDoctoral Thesis, PeerReviewed
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0024 seconds