1 |
Privacy in Complex Sample Based SurveysShawn A Merrill (11806802) 20 December 2021 (has links)
In the last few decades, there has been a dramatic uptick in the issues related to protecting user privacy in released data, both in statistical databases and anonymized records. Privacy-preserving data publishing is a field established to handle these releases while avoiding the problems that plagued many earlier attempts. This issue is of particular importance for governmental data, where both the release and the privacy requirements are frequently governed by legislature (e.g., HIPAA, FERPA, Clery Act). This problem is doubly compounded by the complex survey methods employed to counter problems in data collection. The preeminent definition for privacy is that of differential privacy, which protects users by limiting the impact that any individual can have on the result of any query. <br><br>The thesis proposes models for differentially private versions of current survey methodologies and, discusses the evaluation of those models. We focus on the issues of missing data and weighting which are common techniques employed in complex surveys to counter problems with sampling and response rates. First we propose a model for answering queries on datasets with missing data while maintaining differential privacy. Our model uses k-Nearest Neighbor imputation to replicate donor values while protecting the privacy of the donor. Our model provides significantly better bias reduction in realistic experiments using existing data, as well as providing less noise than a naive solution. Our second model proposes a method of performing Iterative Proportional Fitting (IPF) in a differentially private manner, a common technique used to ensure that survey records are weighted consistently with known values. We also focus on the general philosophical need to incorporate privacy when creating new survey methodologies, rather than assuming that privacy can simply be added at a later step.
|
2 |
ON DATA UTILITY IN PRIVATE DATA PUBLISHINGZhang, Yihua 04 May 2010 (has links)
No description available.
|
3 |
Towards a Privacy Preserving Framework for Publishing Longitudinal DataSehatkar, Morvarid January 2014 (has links)
Recent advances in information technology have enabled public organizations and corporations to collect and store huge amounts of individuals' data in data repositories. Such data are powerful sources of information about an individual's life such as interests, activities, and finances. Corporations can employ data mining and knowledge discovery techniques to extract useful knowledge and interesting patterns from large repositories of individuals' data. The extracted knowledge can be exploited to improve strategic decision making, enhance business performance, and improve services. However, person-specific data often contain sensitive information about individuals and publishing such data poses potential privacy risks. To deal with these privacy issues, data must be anonymized so that no sensitive information about individuals can be disclosed from published data while distortion is minimized to ensure usefulness of data in practice. In this thesis, we address privacy concerns in publishing longitudinal data. A data set is longitudinal if it contains information of the same observation or event about individuals collected at several points in time. For instance, the data set of multiple visits of patients of a hospital over a period of time is longitudinal. Due to temporal correlations among the events of each record, potential background knowledge of adversaries about an individual in the context of longitudinal data has specific characteristics. None of the previous anonymization techniques can effectively protect longitudinal data against an adversary with such knowledge. In this thesis we identify the potential privacy threats on longitudinal data and propose a novel framework of anonymization algorithms in a way that protects individuals' privacy against both identity disclosure and attribute disclosure, and preserves data utility. Particularly, we propose two privacy models: (K,C)^P -privacy and (K,C)-privacy, and for each of these models we propose efficient algorithms for anonymizing longitudinal data. An extensive experimental study demonstrates that our proposed framework can effectively and efficiently anonymize longitudinal data.
|
4 |
Rigorous and Flexible Privacy Protection Framework for Utilizing Personal Spatiotemporal Data / 個人時空間データ利活用のための厳密で柔軟なプライバシ保護フレムワークYang, Cao 23 March 2017 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第20508号 / 情博第636号 / 新制||情||110(附属図書館) / 京都大学大学院情報学研究科社会情報学専攻 / (主査)教授 吉川 正俊, 教授 田中 克己, 教授 岡部 寿男 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
5 |
Duomenų bazės turinio publikavimo interaktyviuose tinklapiuose galimybių tyrimas / Research on the Possibilities of Publishing Data Base content in Interactive Web PagesSelickas, Tomas 31 August 2011 (has links)
Dažnai būna taip, kad internetiniame tinklapyje yra daug svarbių duomenų, tačiau jie nėra pateikti lengvai suprantamoje formoje. Būtent pateiktos informacijos interaktyvumo stoka, sąlygoja ne tik esamos informacijos sudėtingesnį suvokimą ar įsisavinimą, bet taip pat tiesiogiai siejasi su lankytojų srauto mažėjimu. Pastarosios situacijos buvimas ypač aktualus internetiniams tinklapiams, kuriuose kaupiama ir publikuojama daug specifinės srities duomenų. Šiame magistriniame darbe siekiama Exhibit įrankį pritaikyti korektiškam ir pilnavertiškam informacinės sistemos duomenų bazėje kaupiamų duomenų publikavimui ir vizualizavimui. Esamų sprendimų analizė leido atskleisti, kad Exhibit pritaikytas dirbti su statine, tai yra failuose saugoma informacija. Be to Exhibit vidinės duomenų struktūros formavimas gana ilgai užtrunka [Zhao et al., 2008]. Taigi, magistriniame darbe surastos ir pritaikytos priemonės leidžiančios Exhibit įrankį pritaikyti dažnai kintančios ir nuolatos atsinaujinančios informacijos atvaizdavimui. O taip pat, patobulintas metodas, kuris leidžia greičiau suformuoti vidinę Exhibit duomenų struktūrą. / There is common situation where are much of important data on the website, but the data are given in an inconvenient form to use. The lack of information interactivity determines complicated understanding and acquisition of the given information, also it directly determines the decline of website visitors. The being of the mentioned situation is topical to websites where collecting and publishing a lot of specific data. An aim of this research is to find a way to realize qualitatively publishing and visualization of the data stored in information system database. An analysis of the present decisions showed that Exhibit is great tool to solve existing problem. But this tool have a limitation. Exhibit is working with static (stored in files) information. Also creation of internal Exhibit data structure takes too long time. In this research were found and adapted facilities that let to use an Exhibit tool for publishing dynamic information. Also was improved method for faster internal Exhibit data structure creation.
|
6 |
Data Management and Publishing Behaviour in Academic Archaeology : A Study at the Department of Archaeology and Ancient History at Uppsala University / Datahanterings- och publiceringsbeteenden inom Arkeologisk forskning : En studie från institutionen för Arkeologi och Antik Historia på Uppsala universitetBurén, Frida January 2022 (has links)
This study looks into researchers’ data management and publishing behaviours within archaeology by interviewing researchers in the field and data management and publishing specialists. It takes a socio-cultural perspective, and the aim is to gain an understanding of the elements influencing the decision to publish research data within the field and what the current publishing needs there are for researchers in archaeology. Each informant has an academic background in archaeology and a prominent connection to Uppsala University in Sweden. Special focus is placed on the Digital Scientific Archive (DiVA). Each researcher interviewed for this study was found through publications of primary datasets at DiVA and perceptions of publishing options is considered an element shaping the intentions to publish and the behavioural outcome.Documentation practices and perceptions of data management professionalism has tremendous effect on the outcomes of archaeology and publishing practices, which is closely linked with determining data quality, and are becoming more of a requirement within the field. This study is interested in how archaeological data management and publishing behaviour influence the professional interpretations of human history. The Theory of Planned Behaviour is applied to the study for the purpose of thoroughly examine researchers’ data publishing behaviours. The theory assumes that behavioural intention which is shaped by the beliefs about a certain behaviour, is the strongest factor determining actual behaviour. This assumption is evaluated against the accounts of the study’s informants by means of investigating researchers’ perspectives on data publishing in relation to their actual behaviours around, and experiences of, the publishing process. This is a two years' masters thesis in Library and information science. / Den här studien undersöker forskares datahanterings- och publiceringsbeteenden inom arkeologi genom att intervjua forskare inom ämnet, samt datahanterings- och publiceringsspecialister. Studien baseras på ett socio-kulturellt perspektiv och syftet är att öka förståelsen för de element som påverkar beslutstagandet att publicera forskningsdata inom arkeologi, samt vilka data publicerings behov som finns idag. Informanterna har en akademisk bakgrund inom arkeologi och en tydlig koppling till Uppsala universitet. Speciellt fokus har lagts på det Digitala Vetenskapliga Arkivet (DiVA). Forskarna som valdes ut för undersökningen hittades genom deras publikationer av primärdata på DiVA och uppfattningar kring publikationsmöjligheter bedöms påverka publiceringsintentioner och beteendemönster kopplade till datapublicering.Det sätts allt större krav på dokumenteringspraxis inom fältet och uppfattningar om datahanteringsprofessionalism har stor påverkan på arkeologisk forskning och datapublicering, vilket är nära kopplat till den standard som sätts på datakvalité. Den här studien intresserar sig för hur arkeologisk datahanterings- och publiceringsbeteenden påverkar den professionella tolkningen av människans historiska, och förhistoriska, tider. Teorin ’Theory of Planned Behaviour’ används i studien för att noggrant kunna undersöka forskares datapubliceringsbeteenden. Teorin grundar sig på att intentionen att utföra ett beteende, vilket formas av synen på beteendet, har störst makt i avgörandet av beteendets resultat. Teorin vägs mot informanternas redogörelser genom att deras upplevelser och uppfattningar kring datahanterings- och publiceringsbeteenden inom arkeologi undersöks i relation till deras faktiska beteenden och erfarenheter kring publikationsprocessen.
|
7 |
Task Oriented Privacy-preserving (TOP) Technologies Using Automatic Feature SelectionJafer, Yasser January 2016 (has links)
A large amount of digital information collected and stored in datasets creates vast opportunities for knowledge discovery and data mining. These datasets, however, may contain sensitive information about individuals and, therefore, it is imperative to ensure that their privacy is protected.
Most research in the area of privacy preserving data publishing does not make any assumptions about an intended analysis task applied on the dataset. In many domains such as healthcare, finance, etc; however, it is possible to identify the analysis task beforehand. Incorporating such knowledge of the ultimate analysis task may improve the quality of the anonymized data while protecting the privacy of individuals. Furthermore, the existing research which consider the ultimate analysis task (e.g., classification) is not suitable for high-dimensional data.
We show that automatic feature selection (which is a well-known dimensionality reduction technique) can be utilized in order to consider both aspects of privacy and utility simultaneously. In doing so, we show that feature selection can enhance existing privacy preserving techniques addressing k-anonymity and differential privacy and protect privacy while reducing the amount of modifications applied to the dataset; hence, in most of the cases achieving higher utility.
We consider incorporating the concept of privacy-by-design within the feature selection process. We propose techniques that turn filter-based and wrapper-based feature selection into privacy-aware processes. To this end, we build a layer of privacy on top of regular feature selection process and obtain a privacy preserving feature selection that is not only guided by accuracy but also the amount of protected private information.
In addition to considering privacy after feature selection we introduce a framework for a privacy-aware feature selection evaluation measure. That is, we incorporate privacy during feature selection and obtain a list of candidate privacy-aware attribute subsets that consider (and satisfy) both efficacy and privacy requirements simultaneously.
Finally, we propose a multi-dimensional, privacy-aware evaluation function which incorporates efficacy, privacy, and dimensionality weights and enables the data holder to obtain a best attribute subset according to its preferences.
|
8 |
GARBLED COMPUTATION: HIDING SOFTWARE, DATAAND COMPUTED VALUESShoaib Amjad Khan (19199497) 27 July 2024 (has links)
<p dir="ltr">This thesis presents an in depth study and evaluation of a class of secure multiparty protocols that enable execution of a confidential software program $\mathcal{P}$ owned by Alice, on confidential data $\mathcal{D}$ owned by Bob, without revealing anything about $\mathcal{P}$ or $\mathcal{D}$ in the process. Our initial adverserial model is an honest-but-curious adversary, which we later extend to a malicious adverarial setting. Depending on the requirements, our protocols can be set up such that the output $\mathcal{P(D)}$ may only be learned by Alice, Bob, both, or neither (in which case an agreed upon third party would learn it). Most of our protocols are run by only two online parties which can be Alice and Bob, or alternatively they could be two commodity cloud servers (in which case neither Alice nor Bob participate in the protocols' execution - they merely initialize the two cloud servers, then go offline). We implemented and evaluated some of these protocols as prototypes that we made available to the open source community via Github. We report our experimental findings that compare and contrast the viability of our various approaches and those that already exist. All our protocols achieve the said goals without revealing anything other than upper bounds on the sizes of program and data.</p><p><br></p>
|
9 |
Metadados nas instruções de governos para publicadores de dados / Metadata in the government instructions for data publishers / Metadatos en las instrucciones de gobiernos para publicadores de datosCamperos Reyes, Jacquelin Teresa [UNESP] 29 January 2018 (has links)
Submitted by Jacquelin Teresa Camperos Reyes (jtcamperos@hotmail.com) on 2018-02-27T00:46:53Z
No. of bitstreams: 1
[Dissertação] Jacquelin Teresa Camperos Reyes.pdf: 2175519 bytes, checksum: 3dd6306fa263e044ba509d410d39cea4 (MD5) / Approved for entry into archive by Satie Tagara (satie@marilia.unesp.br) on 2018-02-27T14:17:55Z (GMT) No. of bitstreams: 1
camperosreyes_jt_me_mar.pdf: 2175519 bytes, checksum: 3dd6306fa263e044ba509d410d39cea4 (MD5) / Made available in DSpace on 2018-02-27T14:17:55Z (GMT). No. of bitstreams: 1
camperosreyes_jt_me_mar.pdf: 2175519 bytes, checksum: 3dd6306fa263e044ba509d410d39cea4 (MD5)
Previous issue date: 2018-01-29 / Outra / Gerar valor para a sociedade a partir da abundância de dados governamentais tornou-se imperativo nas estratégias de disponibilização de dados que estão sendo publicados por meio de conjuntos de dados ou datasets. Os datasets, dados tabulados com certa estrutura, constituem um exemplo de reunião de bases de dados que pretendem obter sucesso erguendo-se como catálogos centrais do ponto de vista dos cidadãos, ampliando a visibilidade sobre e das ações da gestão pública. Atingir a estruturação desses recursos informacionais, de forma que auxilie na sua revalorização, é um dos desafios da Ciência da Informação. A questão de investigação é: Como está sendo abordada a aderência ao uso de metadados nas instruções entregues aos publicadores de dados em governos? O objetivo é descrever a aderência ao uso de metadados nos datasets de governos de países, tomando como base o contexto e o marco conceitual apresentados nas instruções para publicadores de dados, encontradas nos sites de dados abertos oficiais dos países analisados. Acredita-se que estudos como este podem fornecer elementos que atuem como subsídios para as estratégias governamentais, atendendo dimensões sociais a partir dos profissionais da informação. Trata-se de pesquisa descritiva, de natureza qualitativa, focada na observação crítica dos documentos que abordam o tratamento descritivo dos datasets governamentais. Utilizam-se como procedimentos a análise bibliográfica e documental, e a definição de estudos de caso nos países Colômbia, Brasil, Espanha e Portugal, abordando o volume de dados e informações mediante a técnica de análise de conteúdo. Percebe-se o esforço realizado pelos detentores dos documentos disponibilizados nos quatro países analisados pela ampla abordagem de conteúdo temático relacionado com o uso experimental dos metadados, dando assim maior importância ao aspecto prático em relação ao teórico, sem desconsiderar a relevância das explanações teóricas. Acredita-se na importância da criação e implementação de perfis de aplicação entre comunidades de países, como o caso do DCAT-AP, criado e recomendado pela comunidade europeia de nações e sugerido pelos sites de dados dos países estudados. Admitem-se inquietações referentes aos processos de publicação de dados de governo e suas relações com outros tópicos de interesse socioeconômico, tais como possíveis vínculos com indicadores de desenvolvimento em países e regiões, sob o prisma de pesquisas originadas a partir da Ciência da Informação. / Generating value to society from the abundance of government data, has become imperative in the strategies of data availability that are being published through datasets. The datasets, tabulated data with a certain structure, are an example of a meeting of databases that aim to be successful setting up as central catalogs from the point of view of the citizens, increasing the visibility on and of the actions of the public management. Accomplishing the structuring of these informational resources, so that this helps in their revaluation, is one of the challenges of Information Science. The research question is: How is the adherence to the use of metadata in the instructions given to data publishers in South American governments being addressed? The objective is to describe the adherence to the use of metadata in the datasets of governments of South American countries, based on the context and conceptual framework presented in the instructions for data publishers, found on the official open data sites of the analyzed countries. It is believed that studies such as this one can provide elements that act as subsidies for government strategies, addressing social dimensions deriving out of the information professionals. It is a descriptive research, of qualitative nature, focused on the critical observation of the documents that approach the descriptive treatment of the governmental datasets. Bibliographic and documentary analyses are used as methodological procedures, and the definition of case studies in the countries Colombia, Brazil, Spain and Portugal, addressing the volume of data and information through the technique of content analysis. The effort made by the holders of the documents available in the four analyzed countries by the broad thematic content related to the experimental use of the metadata is noticed, thus giving greater importance to the practical aspect in relation to the theoretical, without disregarding the relevance of the theoretical explanations. It is believed that creating and implementing application profiles among communities of countries, such as the DCAT-AP, created and recommended by the European community of nations and suggested by the data sites of the analyzed countries, is important. Concerns referent to the publication processes of government data and their relations with other topics of socio-economic interest are admitted, such as possible linkages with indicators of development in countries and regions, under the prism of research originated from the Information Science.
|
Page generated in 0.0675 seconds