Spelling suggestions: "subject:"forminformation science|computer science"" "subject:"forminformation science|coomputer science""
11 |
Automatic text classification using a multi-agent frameworkFu, Yueyu. January 2006 (has links)
Thesis (Ph.D.)--Indiana University, School of Library and Information Science, 2006. / "Title from dissertation home page (viewed July 12, 2007)." Source: Dissertation Abstracts International, Volume: 67-10, Section: A, page: 3634. Adviser: Javed Mostafa.
|
12 |
Looking for a haystack selecting data sources in a distributed retrieval system /Scherle, Ryan. January 2006 (has links)
Thesis (Ph.D.)--Indiana University, Dept. of Computer Science and Cognitive Science, 2006. / "Title from dissertation home page (viewed July 17, 2007)." Source: Dissertation Abstracts International, Volume: 67-10, Section: B, page: 5859. Advisers: David B. Leake; Michael Gasser.
|
13 |
Health Data Analysis and PredictionsPandya, Nisarg P. 20 March 2018 (has links)
<p> This thesis aims to provide new insights on how viruses spread within people of the same nationality. Our central hypothesis states that viruses spread easily in people from the same nationality when they live close to each other. This paper creates and examines a network graph of patients where nodes are patients in different geographic locations fetched from the health dataset used and edges represent a relationship between them. A set of location-based graph attributes including Clustering Coefficient, Closeness, and Betweenness are discussed and used for analyzing the social context of a patient’s geographic location. The Clustering Coefficient measures the probability that the adjacent patients of a patient are connected. We utilize it to know how viruses spread from one patient to another in the same network. Closeness measures the steps required to reach every other patient from a given patient. We use it to analyze how quickly viruses can be spread among patients living in same region. Finally, we use Betweenness to find the patient mainly responsible for transmitting viruses among other patients, which provides the number of shortest paths passing through a patient. Our analysis extracts useful information to identify the factors behind virus transmission among people and across geographies. It also provides the most prevalent disease information by patient’s nationality. As a result of this analysis, it was found out that Indians are mostly infected by viruses of colds and these viruses spread easily in patients who live close to each other. This contextual information can be helpful in solving potential public health issues.</p><p>
|
14 |
Feature Selection through Visualisation for the Classification of Online ReviewsKoka, Keerthika 19 October 2017 (has links)
<p> The purpose of this work is to prove that the visualization is at least as powerful as the best automatic feature selection algorithms. This is achieved by applying our visualization technique to the online review classication into fake and genuine reviews. Our technique uses radial chart and the color overlaps to explore the best feature selection through visualization for classication. Every review is treated as a radial translucent red or blue membrane with its dimensions determining the shape of the membrane. This work also shows how the dimension ordering and combination is relevant in the feature selection process. In brief, the whole idea is about giving a structure to each text review based on certain attributes, comparing how different or how similar the structure of the different or same categories are and highlighting the key features that contribute to the classication the most. Colors and saturations aid in the feature selection process. Our visualization technique helps the user get insights into the high dimensional data by providing means to eliminate the worst features right away, pick some best features without statistical aids, understand the behavior of the dimensions in different combinations. This work outlines the different approaches explored, results and analysis. </p><p>
|
15 |
Community-based Networks for Challenged EnvironmentsVigil-Hayes, Morgan Ashlee 14 September 2017 (has links)
<p> The Internet as a networked system has been rendered more complex than ever before as human endpoints are grafted into the system via increasingly pervasive and personalized networked devices. According to the United Nations, the Internet is a transnational enabler of a number of human rights, and as such, access to the Internet has been proclaimed to be a basic right unto itself. Unfortunately, even as networked devices have become ubiquitous, access to the Internet has not. In many cases, the reasons behind this digital divide involve contextual challenges such as limited infrastructure, limited economic viability, and rugged terrain. In this dissertation, we seek to ameliorate these challenges by designing data-driven, community-based network infrastructure. </p><p> In order to extend Internet connectivity to communities located in some of the most challenging contexts, we start by understanding how Internet connectivity is used when communities receive initial Internet access. We do this by partnering with two ISPs (Internet service providers) that brought initial Internet connectivity to two geographic regions in Indian Country. The data we have collected from these two ISPs totals to 115 TB generated over a combined three years of partnerships. Our ISP collaborators serve a total of 1,300 subscribers who represent residents of 14 different Native American reservations representing 18 different tribes. The service areas of these ISPs include predominantly rural communities located on mountainous and forested terrain. Key findings from our analysis of data generated by these ISPs include: the prevalence of social media and streaming content, the locality of interest with respect to social media content, and the similarity of Web browsing preferences between households and the aggregate communities to which they belong. We augment our analysis of network traces collected from ISPs with analysis of data collected from some of the most prevalent social media platforms. One of our studies mines Instagram trace data collected from Instagram servers to better understand the relationship between network infrastructure capacity and social media usage patterns. We found that only a small percentage of content available to users over social media platforms is actually interacted with by users and that only a small portion of available bandwidth is needed to support interaction with this content. Moreover, in our analysis of the diffusion of content disseminated by Native American advocates on Twitter, we found that the rate of diffusion and the prevalence of content is tied to its media richness, and that richer content does not guarantee rapid diffusion or longevity in the network. Based on the results our analyses as well as findings in related work, we design four community-based network technologies that address the network challenges associated with rural and developing contexts.</p><p> First, we introduce a social media content distribution system that operates over FM radio [200]. In order to provide content over a 1.2 Kbps technology (the Radio Broadcast Data System), we create a graph-based metric, the cumulative clustering coefficient, to filter content based on its total audience size and the diversity of its audience scope. We evaluate this delivery system used a trace-based simulation and we find that 81% of users received at least half of their content requests and 35.5% of the 1.1 million re- quested Instagram photos were transmitted to users. Next, we introduce FiDO [203], a community-based Web browsing agent and content delivery system that enables users from disconnected households to collect relevant content for themselves and members of their households opportunistically from content caches co-located with cellular base stations. We evaluate FiDO using a trace-driven simulation that combines Web traces collected from one of our partner ISPs in addition to statistical models parameterized with census and transportation data. We find that an average of 80% of a household’s cacheable Web files can be delivered opportunistically and when crawling the Web on behalf of disconnected households, FiDO is able to provide an average of 69 Web pages to each household (where 73% of a household’s most browsed Web domains are repre- sented by the content collected on their behalf). We then describe some of the challenges associated with content creation and data collection in challenging contexts and intro- duce Open Development Kit (ODK) Submit and VillageShare for rural schools. ODK Submit is a smartphone-based platform that sits between data collection applications and the network interfaces of a devices [26]. It seeks to ease the burden of navigating heterogeneous network conditions for application developers, data collectors, and data processors. Principles from ODK Submit were incorporated into the publicly available ODK v. 2.0 tool suite as part of the Aggregate Tables Extensions suite [143]. In addition, we introduce VillageShare for rural schools, which enables schools in poorly-connected, rural areas to create and share culturally relevant curricula and empowers students to work collaboratively on “local cloud-based” projects despite their lack of access to net- work connectivity at home. We provide an evaluation of VillageShare that has been informed and parameterized by the deployment of Internet connectivity to rural schools over high-latency, low-bandwdith technology in South Africa.</p><p> We conclude with an overview of our key findings as well as a discussion of future research directions inspired by the work in this dissertation.</p><p>
|
16 |
Evaluation of find-similar with simulation and network analysisSmucker, Mark D 01 January 2008 (has links)
Every day, people use information retrieval (IR) systems to find documents that satisfy their information needs. Even though IR has revolutionized the way people find information, IR systems can still fail to satisfy people's information needs. In this dissertation, we show how the addition of a simple user interaction mechanism, find-similar, can improve retrieval quality by making it easier for users to navigate from relevant documents to other relevant documents. Find-similar allows a user to request documents similar to a given document. In the first part of the dissertation, we measure find-similar's retrieval potential through simulation of a user's behavior with hypothetical user interfaces. We show that find-similar has the potential to improve the retrieval quality of a state-of-the-art IR system by 23% and match the performance of relevance feedback. As part of a case study that first shows how find-similar can help PubMed users find relevant documents, we then show how find-similar responds to varying initial conditions and acts to compensate for poor retrieval quality. In the second part of the dissertation, we characterize find-similar in the absence of a particular user interface by measuring the quality of the document networks formed by find-similar's document-to-document similarity measure. Find-similar effectively creates links between documents that allow the user to navigate documents by similarity. We show that find-similar's similarity measure affects the navigability of the document network and how a query-biased similarity measure can improve find-similar. We develop measures of network navigability and show that find-similar should make the World Wide Web more navigable. Taken together, the simulation of find-similar and the measurement of the navigability of document networks shows how find-similar as a simple user interaction mechanism can improve a user's ability to find relevant documents.
|
17 |
Incident threading in newsFeng, Ao 01 January 2008 (has links)
With an overwhelming volume of news reports currently available, there is an increasing need for automatic techniques to analyze and present news to a general reader in a meaningful and efficient manner. Previous research has focused primarily on organizing news stories into a list of clusters by the main topics that they discuss. We believe that viewing a news topic as a simple collection of stories is restrictive and inefficient for a user hoping to understand the information quickly. As a proposed solution to the automatic news organization problem, we introduce incident threading in this thesis. All text that describes the occurrence of a real-world happening is merged into a news incident, and incidents are organized in a network with dependencies of predefined types. In order to simplify the implementation, we start with the common assumption that a news story is coherent in content. In the story threading system, a cluster of news documents discussing the same topic are further grouped into smaller sets, where each represents a separate news event. Binary links are established to reflect the contextual information among those events. Experiments in story threading show promising results. We next describe an enhanced version called relation-oriented story threading that extends the range of the prior work by assigning type labels to the links and describing the relation within each story pair as a competitive process among multiple options. The quality of links is greatly improved with a global optimization process. Our final approach, passage threading, removes the story-coherence assumption by conducting passage-level processing of news. First we develop a new testbed for this research and extend the evaluation methods to address new issues. Next, a calibration study demonstrates that an incident network helps reading comprehension with an accuracy of 25-30% in a matrix comparison evaluation. Then a new three-stage algorithm is described that identifies on-subject passages, groups them into incidents, and establishes links between related incidents. Finally, significant improvement over earlier work is observed when the training phase optimizes the harmonic mean of various evaluation measures, and the performance meets the goal in the calibration study.
|
18 |
Applications and extensions of pClust to big microbial proteomic dataLockwood, Svetlana 19 July 2016 (has links)
<p> The goal of biological sciences is to understand the biomolecular mechanics of living organisms. Proteins serve as the foundation for organisms functional analysis and sequence analysis has shown to be invaluable in answering questions about individual organisms. The first step in any sequence analysis is alignment and it is common that even modestly sized studies involve hundreds of thousands of protein sequences. </p><p> In multigenome studies, the time consideration for sequence alignment becomes paramount and heuristic algorithms are frequently used sacrificing accuracy for speedup. At the same time, new algorithms have appeared that provide not only highly efficient performance, but also guarantee to deliver optimal solutions. However, the adoption of these algorithms is hindered by the absence of generalized analysis pipeline as well as availability of user-friendly computational tools. In this dissertation we present applications of existing, computationally efficient algorithms to multigenome studies where we apply our developed pClust pipelineto various sets of microbial organisms. The computational time is significantly improved and the results are more accurate than those obtained by traditional methods. </p><p> The first study is a baseline comparison study on a small set of 11 microorganisms. It compares pClust results to the existing scientific knowledge and finds it to be consistent while at the same time providing new insights. </p><p> The second study addresses the question of identification of common tick-transmissiblity mechanisms across different species. It involves a larger set of 108 microbial genomes with approximately 127K protein sequences. Traditionally, a study of such scope would have required days or at least hours of CPU time of high-performance computers to produce all-versus-all sequence alignment. Using pClust it took less than 10 minutes on a desktop computer to perform sequence alignment and clustering. For this study we also developed a graphical user interface for pClust in order to make the new algorithms more accessible for use by microbiologists. </p><p> The third study analyzes the set of all proteobacterial genomes. The study comprised of 2326 complete genomes containing 8.7M protein sequences. The alignment was performed using pGraph-Tascel algorithm on high-performance computers. This is the first study of its kind.</p>
|
19 |
Fair cost sharing auction mechanisms in last mile ridesharingNguyen, Duc Thien 28 December 2013 (has links)
<p> With rapid growth of transportation demands in urban cities, one major challenge is to provide efficient and effective door-to-door service to passengers using the public transportation system. This is commonly known as the Last Mile problem. In this thesis, we consider a dynamic and demand responsive mechanism for Ridesharing on a non-dedicated commercial fleet (such as taxis). This problem is addressed as two sub problems, the first of which is a special type of vehicle routing problems (VRP). The second sub-problem, which is more challenging, is to allocate the cost (i.e. total fare) fairly among passengers. We propose auction mechanisms where we allow passengers to submit their willing payments. We show that our bidding model is budget-balanced, fairness-preserving, and most importantly, incentive-compatible. We also show how the winner determination problem can be solved efficiently. A series of experimental studies are designed to demonstrate the feasibility and efficiency of our proposed mechanisms.</p>
|
20 |
Alone together a socio-technical theory of motivation, coordination and collaboration technologies in organizing for free and open source software development /Howison, James. January 2009 (has links)
Thesis (Ph. D.)--Syracuse University, 2009. / "Publication number: AAT 3381579 ."
|
Page generated in 0.1417 seconds