Return to search

Simultaneous discrimination prevention and privacy protection in data publishing and mining

Data mining is an increasingly important technology for extracting useful knowledge hidden
in large collections of data. There are, however, negative social perceptions about data mining,
among which potential privacy violation and potential discrimination. The former is an
unintentional or deliberate disclosure of a user pro le or activity data as part of the output
of a data mining algorithm or as a result of data sharing. For this reason, privacy preserving
data mining has been introduced to trade o the utility of the resulting data/models for
protecting individual privacy. The latter consists of treating people unfairly on the basis
of their belonging to a speci c group. Automated data collection and data mining techniques
such as classi cation have paved the way to making automated decisions, like loan
granting/denial, insurance premium computation, etc. If the training datasets are biased
in what regards discriminatory attributes like gender, race, religion, etc., discriminatory
decisions may ensue. For this reason, anti-discrimination techniques including discrimination
discovery and prevention have been introduced in data mining. Discrimination can be
either direct or indirect. Direct discrimination occurs when decisions are made based on
discriminatory attributes. Indirect discrimination occurs when decisions are made based
on non-discriminatory attributes which are strongly correlated with biased discriminatory
ones.
In the rst part of this thesis, we tackle discrimination prevention in data mining and
propose new techniques applicable for direct or indirect discrimination prevention individually
or both at the same time. We discuss how to clean training datasets and outsourced
datasets in such a way that direct and/or indirect discriminatory decision rules are converted
to legitimate (non-discriminatory) classi cation rules. The experimental evaluations
demonstrate that the proposed techniques are e ective at removing direct and/or indirect
discrimination biases in the original dataset while preserving data quality.
In the second part of this thesis, by presenting samples of privacy violation and potential
discrimination in data mining, we argue that privacy and discrimination risks should
be tackled together. We explore the relationship between privacy preserving data mining
and discrimination prevention in data mining to design holistic approaches capable of addressing
both threats simultaneously during the knowledge discovery process. As part of
this e ort, we have investigated for the rst time the problem of discrimination and privacy
aware frequent pattern discovery, i.e. the sanitization of the collection of patterns mined
from a transaction database in such a way that neither privacy-violating nor discriminatory
inferences can be inferred on the released patterns. Moreover, we investigate the problem
of discrimination and privacy aware data publishing, i.e. transforming the data, instead of
patterns, in order to simultaneously ful ll privacy preservation and discrimination prevention.
In the above cases, it turns out that the impact of our transformation on the quality
of data or patterns is the same or only slightly higher than the impact of achieving just
privacy preservation.

Identiferoai:union.ndltd.org:TDX_URV/oai:www.tdx.cat:10803/119651
Date10 June 2013
CreatorsHajian, Sara
ContributorsPedreschi, Dino, Domingo-Ferrer, Josep, 1965-, Universitat Rovira i Virgili. Departament d'Enginyeria Informàtica i Matemàtiques
PublisherUniversitat Rovira i Virgili
Source SetsUniversitat Rovira i Virgili
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/doctoralThesis, info:eu-repo/semantics/publishedVersion
Format176 p., application/pdf
SourceTDX (Tesis Doctorals en Xarxa)
Rightsinfo:eu-repo/semantics/openAccess, ADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs.

Page generated in 0.0014 seconds