Global ETD Search

21	Medical concept embedding with ontological representations Song, Lihong 28 August 2019 (has links) Learning representations of medical concepts from the Electronic Health Records (EHRs) has been shown effective for predictive analytics in healthcare. The learned representations are expected to preserve the semantic meanings of different medical concepts, which can be treated as features and thus benefit a variety of applications. Medical ontologies have also been explored to be integrated with the EHR data to further enhance the accuracy of various prediction tasks in healthcare. Most of the existing works assume that medical concepts under the same ontological category should share similar representations, which however does not always hold. In particular, the categorizations in the categorical medical ontologies were established with various factors being considered. Medical concepts even under the same ontological category may not follow similar occurrence patterns in the EHR data, leading to contradicting objectives for the representation learning. In addition, these existing works merely utilize the categorical ontologies. Actually, it has been noticed that ontologies containing multiple types of relations are also available. However, studies rarely make use of the diverse types of medical ontologies. In this thesis research, we propose three novel representation learning models for integrating the EHR data and medical ontologies for predictive analytics. To improve the interpretability and alleviate the conflicting objective issue between the EHR data and medical ontologies, we propose techniques to learn medical concepts embeddings with multiple ontological representations. To reduce the reliance on labeled data, we treat the co-occurrence statistics of clinical events as additional training signals, which help us learn good representations even with few labeled data. To leverage the various domain knowledge, we also consider multiple medical ontologies (CCS, ATC and SNOMED-CT) and propose corresponding attention mechanisms so as to take the best advantage of the medical ontologies with better interpretability. Our proposed models can achieve the final medical concept representations which align better with the EHR data. We conduct extensive experiments, and our empirical results prove the effectiveness of the proposed methods. Keywords: Bio/Medicine, Healthcare-AI, Electronic Health Record, Representation Learning, Machine Learning Applications
22	DIFFERENTIAL PRIVACY IN DISTRIBUTED SETTINGS Zitao Li (14135316) 18 November 2022 (has links) <p>Data is considered the "new oil" in the information society and digital economy. While many commercial activities and government decisions are based on data, the public raises more concerns about privacy leakage when their private data are collected and used. In this dissertation, we investigate the privacy risks in settings where the data are distributed across multiple data holders, and there is only an untrusted central server. We provide solutions for several problems under this setting with a security notion called differential privacy (DP). Our solutions can guarantee that there is only limited and controllable privacy leakage from the data holder, while the utility of the final results, such as model prediction accuracy, can be still comparable to the ones of the non-private algorithms.</p> <p><br></p> <p>First, we investigate the problem of estimating the distribution over a numerical domain while satisfying local differential privacy (LDP). Our protocol prevents privacy leakage in the data collection phase, in which an untrusted data aggregator (or a server) wants to learn the distribution of private numerical data among all users. The protocol consists of 1) a new reporting mechanism called the square wave (SW) mechanism, which randomizes the user inputs before sharing them with the aggregator; 2) an Expectation Maximization with Smoothing (EMS) algorithm, which is applied to aggregated histograms from the SW mechanism to estimate the original distributions.</p> <p><br></p> <p>Second, we study the matrix factorization problem in three federated learning settings with an untrusted server, i.e., vertical, horizontal, and local federated learning settings. We propose a generic algorithmic framework for solving the problem in all three settings. We introduce how to adapt the algorithm into differentially private versions to prevent privacy leakage in the training and publishing stages.</p> <p><br></p> <p>Finally, we propose an algorithm for solving the k-means clustering problem in vertical federated learning (VFL). A big challenge in VFL is the lack of a global view of each data point. To overcome this challenge, we propose a lightweight and differentially private set intersection cardinality estimation algorithm based on the Flajolet-Martin (FM) sketch to convey the weight information of the synopsis points. We provide theoretical utility analysis for the cardinality estimation algorithm and further refine it for better empirical performance.</p> Data and information privacy Differential Privacy federated learning applications
23	Analyzing and Improving Security-Enhanced Communication Protocols Weicheng Wang (17349748) 08 November 2023 (has links) <p dir="ltr">Security and privacy are one of the top concerns when experts select for communication protocols. When a protocol is confirmed with problems, such as leaking users’ privacy, the protocol developers will upgrade it to an advanced version to cover those concerns in a short interval, or the protocol will be discarded or replaced by other secured ones. </p><p dir="ltr">There are always communication protocols failing to protect users’ privacy or exposing users’ accounts under attack. A malicious user or an attacker can utilize the vulnerabilities in the protocol to gain private information, or even take control of the users’ devices. Hence, it is important to expose those protocols and improve them to enhance the security properties. Some protocols protect users’ privacy but in a less efficient way. Due to the new cryptography technique or the modern hardware support, the protocols can be improved with less overhead and enhanced security protection. </p><p dir="ltr">In this dissertation, we focus on analyzing and improving security-enhanced communication protocols in three aspects: </p><p dir="ltr">(1) We systematically analyzed an existing and widely used communication protocol: Zigbee. We identified the vulnerabilities of the existing Zigbee protocols during the new device joining process and proposed a security-enhanced Zigbee protocol. The new protocol utilized public-key primitives with little extra overhead with capabilities to protect against the outsourced attackers. The new protocol is formally verified and implemented with a prototype. </p><p dir="ltr">(2) We explored one type of communication detection system: Keyword-based deep packet inspection. The system has several protocols, such as BlindBox, PrivDPI, PE-DPI, mbTLS, and so on. We analyzed those protocols and identified their vulnerabilities or inefficiencies. To address those issues, we proposed three enhanced protocols: MT-DPI, BH-DPI, and CE-DPI which work readily with AES-based encryption schemes deployed and well-supported by AES-NI. Specifically, MT-DPI utilized multiplicative triples to support multi-party computation. </p><p dir="ltr">(3) We developed a technique to support Distributed confidential computing with the use of a trusted execution environment. We found that the existing confidential computing cannot handle multiple-stakeholder scenarios well and did not give reasonable control over derived data after computation. We analyzed six real use cases and pointed out what is missing in the existing solutions. To bridge the gap, we developed a language SeDS policy that was built on top of the trusted execution environment. It works well for specific privacy needs during the collaboration and gives protection over the derived data. We examined the language in the use cases and showed the benefits of applying the new policies.</p> Data and information privacy Data security and protection security privacy communication protocol
24	Data mining using the crossing minimization paradigm Abdullah, Ahsan January 2007 (has links) Our ability and capacity to generate, record and store multi-dimensional, apparently unstructured data is increasing rapidly, while the cost of data storage is going down. The data recorded is not perfect, as noise gets introduced in it from different sources. Some of the basic forms of noise are incorrect recording of values and missing values. The formal study of discovering useful hidden information in the data is called Data Mining. Because of the size, and complexity of the problem, practical data mining problems are best attempted using automatic means. Data Mining can be categorized into two types i.e. supervised learning or classification and unsupervised learning or clustering. Clustering only the records in a database (or data matrix) gives a global view of the data and is called one-way clustering. For a detailed analysis or a local view, biclustering or co-clustering or two-way clustering is required involving the simultaneous clustering of the records and the attributes. In this dissertation, a novel fast and white noise tolerant data mining solution is proposed based on the Crossing Minimization (CM) paradigm; the solution works for one-way as well as two-way clustering for discovering overlapping biclusters. For decades the CM paradigm has traditionally been used for graph drawing and VLSI (Very Large Scale Integration) circuit design for reducing wire length and congestion. The utility of the proposed technique is demonstrated by comparing it with other biclustering techniques using simulated noisy, as well as real data from Agriculture, Biology and other domains. Two other interesting and hard problems also addressed in this dissertation are (i) the Minimum Attribute Subset Selection (MASS) problem and (ii) Bandwidth Minimization (BWM) problem of sparse matrices. The proposed CM technique is demonstrated to provide very convincing results while attempting to solve the said problems using real public domain data. Pakistan is the fourth largest supplier of cotton in the world. An apparent anomaly has been observed during 1989-97 between cotton yield and pesticide consumption in Pakistan showing unexpected periods of negative correlation. By applying the indigenous CM technique for one-way clustering to real Agro-Met data (2001-2002), a possible explanation of the anomaly has been presented in this thesis. 005.3
25	Geografické informační systémy / Geographic information systems Vodička, Ondřej January 2010 (has links) The diploma thesis focuses on geographic information systems (GIS). The first part of this thesis introduces GIS, it shows their specifics and emphasizes the significance of standardization in the GIS industry. The second part describes the current situation on the GIS market. The GIS software is divided into different categories depending on the provided functionality and at the same time it is divided into an open source and a commercial part. Based on the categories, individual software products are introduced. The next part individually deals with GIS products offered by the Oracle corporation. The last part provides various possibilities, suggestions and recommendations for designing a GIS architecture using ESRI products.
26	Sustainable intermodal freight transportation : applying the geospatial intermodal freight transport model / Comer, Bryan. January 2009 (has links) Thesis (M.S.)--Rochester Institute of Technology, 2009. / Typescript. Includes bibliographical references.
27	A report of an administrative analysis of a police car reporting system Speed, Oscar. January 1961 (has links) Thesis (Ph. D.)--University of Southern California.
28	Semantic interoperability of geospatial ontologies: a model-theoretic analysis / Farrugia, James A. January 2007 (has links) (PDF) Thesis (Ph.D.) in Spatial Information Science and Engineering--University of Maine, 2007. / Includes vita. Includes bibliographical references (leaves 145-153).
29	Combining geospatial and temporal ontologies / Joshi, Kripa, January 2007 (has links) Thesis (M.S.) in Spatial Information Science and Engineering--University of Maine, 2007. / Includes vita. Includes bibliographical references (leaves 108-112).
30	Using statistical and knowledge-based approaches for literature-based discovery / Yildiz, Meliha Yetisgen. January 2007 (has links) Thesis (Ph. D.)--University of Washington, 2007. / Vita. Includes bibliographical references (leaves 97-103).

Search results