We study possibilities of automatic sorting of incoming e-mails. Our primary goal is to distinguish information about oncoming workshops and conferences, job offers and published books. We are developing mining tool for extracting the information from data originated in profession-specific mailing lists. Offers in the mailing lists come in html, rtf or plain text format. The messages are written in common spoken language. We have developed the system so it will use text mining methods to extract the information and save it structured form. Then we will be able to work with it. We are examining how user handles the mail and apply the knowledge in the development. We solve the problems with obtaining of the messages, distinguishing language and encoding and estimating the type of message. After recognition of the transported information we are able to mine data. In the end we save the mined information to the database, which allows us to display it in well-arranged way, sort and search according to the user needs.
Identifer | oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:295989 |
Date | January 2011 |
Creators | Šebesta, Jan |
Contributors | Žemlička, Michal, Kopecký, Michal |
Source Sets | Czech ETDs |
Language | Czech |
Detected Language | English |
Type | info:eu-repo/semantics/masterThesis |
Rights | info:eu-repo/semantics/restrictedAccess |
Page generated in 0.0017 seconds