Global ETD Search

1	The User Attribution Problem and the Challenge of Persistent Surveillance of User Activity in Complex Networks Taglienti, Claudio 01 January 2014 (has links) In the context of telecommunication networks, the user attribution problem refers to the challenge faced in recognizing communication traffic as belonging to a given user when information needed to identify the user is missing. This is analogous to trying to recognize a nameless face in a crowd. This problem worsens as users move across many mobile networks (complex networks) owned and operated by different providers. The traditional approach of using the source IP address, which indicates where a packet comes from, does not work when used to identify mobile users. Recent efforts to address this problem by exclusively relying on web browsing behavior to identify users were limited to a small number of users (28 and 100 users). This was due to the inability of solutions to link up multiple user sessions together when they rely exclusively on the web sites visited by the user. This study has tackled this problem by utilizing behavior based identification while accounting for time and the sequential order of web visits by a user. Hierarchical Temporal Memories (HTM) were used to classify historical navigational patterns for different users. Each layer of an HTM contains variable order Markov chains of connected nodes which represent clusters of web sites visited in time order by the user (user sessions). HTM layers enable inference "generalization" by linking Markov chains within and across layers and thus allow matching longer sequences of visited web sites (multiple user sessions). This approach enables linking multiple user sessions together without the need for a tracking identifier such as the source IP address. Results are promising. HTMs can provide high levels of accuracy using synthetic data with 99% recall accuracy for up to 500 users and good levels of recall accuracy of 95 % and 87% for 5 and 10 users respectively when using cellular network data. This research confirmed that the presence of long tail web sites (rarely visited) among many repeated destinations can create unique differentiation. What was not anticipated prior to this research was the very high degree of repetitiveness of some web destinations found in real network data. Accuracy Scalability Complex Network Mobile Networks User-Attribution Problem User Classification Computer Sciences
2	Event-related Collections Understanding and Services Li, Liuqing 18 March 2020 (has links) Event-related collections, including both tweets and webpages, have valuable information, and are worth exploring in interdisciplinary research and education. Unfortunately, such data is noisy, so this variety of information has not been adequately exploited. Further, for better understanding, more knowledge hidden behind events needs to be unearthed. Regarding these collections, different societies may have different requirements in particular scenarios. Some may need relatively clean datasets for data exploration and data mining. Social researchers require preprocessing of information, so they can conduct analyses. General societies are interested in the overall descriptions of events. However, few systems, tools, or methods exist to support the flexible use of event-related collections. In this research, we propose a new, integrated system to process and analyze event-related collections at different levels (i.e., data, information, and knowledge). It also provides various services and covers the most important stages in a system pipeline, including collection development, curation, analysis, integration, and visualization. Firstly, we propose a query likelihood model with pre-query design and post-query expansion to rank a webpage corpus by query generation probability, and retrieve relevant webpages from event-related tweet collections. We further preserve webpage data into WARC files and enrich original tweets with webpages in JSON format. As an application of data management, we conduct an empirical study of the embedded URLs in tweets based on collection development and data curation techniques. Secondly, we develop TwiRole, an integrated model for 3-way user classification on Twitter, which detects brand-related, female-related, and male-related tweeters through multiple features with both machine learning (i.e., random forest classifier) and deep learning (i.e., an 18-layer ResNet) techniques. As guidance to user-centered social research at the information level, we combine TwiRole with a pre-trained recurrent neural network-based emotion detection model, and carry out tweeting pattern analyses on disaster-related collections. Finally, we propose a tweet-guided multi-document summarization (TMDS) model, which generates summaries of the event-related collections by using tweets associated with those events. The TMDS model also considers three aspects of named entities (i.e., importance, relatedness, and diversity) as well as topics, to score sentences in webpages, and then rank selected relevant sentences in proper order for summarization. The entire system is realized using many technologies, such as collection development, natural language processing, machine learning, and deep learning. For each part, comprehensive evaluations are carried out, that confirm the effectiveness and accuracy of our proposed approaches. Regarding broader impact, the outcomes proposed in our study can be easily adopted or extended for further event analyses and service development. / Doctor of Philosophy / Event-related collections, including both tweets and webpages, have valuable information. They are worth exploring in interdisciplinary research and education. Unfortunately, such data is noisy. Many tweets and webpages are not relevant to the events. This leads to difficulties during data analysis of the datasets, as well as explanation of the results. Further, for better understanding, more knowledge hidden behind events needs to be unearthed. Regarding these collections, different groups of people may have different requirements. Some may need relatively clean datasets for data exploration. Some require preprocessing of information, so they can conduct analyses, e.g., based on tweeter type or content topic. General societies are interested in the overall descriptions of events. However, few systems, tools, or methods exist to support the flexible use of event-related collections. Accordingly, we describe our new framework and integrated system to process and analyze event-related collections. It provides varied services and covers the most important stages in a system pipeline. It has sub-systems to clean, manage, analyze, integrate, and visualize event-related collections. It takes an event-related tweet collection as input and generates an event-related webpage corpus by leveraging Wikipedia and the URLs embedded in tweets. It also combines and enriches original tweets with webpages. As an application of data management, we conduct an empirical study of tweets and their embedded URLs. We developed TwiRole for 3-way user classification on Twitter. It detects brand-related, female-related, and male-related tweeters through their profiles, tweets, and images. To aid user-centered social research, we combine TwiRole with an existing emotion detection tool, and carry out tweeting pattern analyses on disaster-related collections. Finally, we propose a tweet-guided multi-document summarization (TMDS) model and service, which generates summaries of the event-related collections by using tweets associated with those events. It extracts important sentences across different topics from webpages, and organizes them in proper order. The entire system is realized using many technologies, such as collection development, natural language processing, machine learning, and deep learning. For each part, comprehensive evaluations help confirm the effectiveness and accuracy of our proposed approaches. Regarding broader impact, our methods and system can be easily adopted or extended for further event analyses and service development. Event-related Collections Collection Development Curation URL Analysis User Classification Tweeting Pattern Analysis Webpages Text Summarization
3	Informační systémy ve vysokém školství s důrazem na identifikaci uživatelů, informačních potřeb a jejich uspokojování / Information systems in higher education. User identification, information needs and their satisfaction Příbramská, Ivana January 2012 (has links) Mgr. Ivana Příbramská: Informační systémy ve vysokém školství s důrazem na identifikaci uživatelů, informačních potřeb a jejich uspokojování (dizertační práce) Abstract Focus of this thesis is on users in tertiary education and on the extent, to which their information needs are met by information systems and resources provided by both higher education institutions and other organizations. Second chapter sets up a theoretical framework, summarizing research in the field of the information seeking and information seeking behavior, models of information seeking, information needs and their satisfaction. Third chapter describes higher education area and its role in information provision to the users (coming from both within and outside the area); all users within this area are classified and their information needs and sources for satisfaction of these needs are listed. Fourth chapter is concerned with information systems in higher education in Czech Republic including classifications used within this field, national systems collecting data from higher education institutions and short description of three local information systems. Final chapter describes two examples of implementation process and further development of two particular information systems used at the Charles University in Prague and on the...

1

Page generated in 0.1186 seconds