Data Privacy - Data for Public Good

On the Optimal Number of Grids for Differentially Private Non-Interactive K-Means Clustering

On the Optimal Number of Grids for Differentially Private Non-Interactive K-Means Clustering Authors: Gokularam M, Anshoo Tandon Differentially private K-means clustering enables releasing cluster centers derived from a dataset while protecting the privacy of the individuals. Non-interactive clustering techniques based on privatized histograms are attractive because the released data synopsis can be reused for other downstream […]

SKALD: Scalable K-Anonymisation for Large Datasets

Authors:K. Reddy, N. Chakraborty, A. Dharmavaram, A. Tandon Know more Data privacy and anonymisation are critical concerns in today’s data-driven society, particularly when handling personal and sensitive user data. Regulatory frameworks worldwide recommend privacy-preserving protocols such as k-anonymisation to de-identify releases of tabular data. Available hardware resources provide an upper bound on the maximum size […]

Improving the Privacy Loss Under User-Level DP Composition for Fixed Estimation Error

Authors:V. A. Rameshwar and A. Tandon Know more This paper considers the private release of statistics of several disjoint subsets of a datasets. In particular, we consider the epsilon-user-level differentially private release of sample means and variances of sample values in disjoint subsets of a dataset, in a potentially sequential manner. Traditional analysis of the privacy […]

ℓ, 𝛿)-Diversity: Linkage-Robustness via a Composition Theorem

Authors:V. A. Rameshwar and A. Tandon Know more In this paper, we consider the problem of degradation of anonymity upon linkages of anonymized datasets. We work in the setting where an adversary links together tgeq 2 anonymized datasets in which a user of interest participates, based on the user’s known quasi-identifiers, which motivates the use of ell-diversity as […]

Bounding User Contributions for User-Level Differentially Private Mean Estimation

Authors: V. Arvind Rameshwar (IIT Madras) and Anshoo Tandon Know more We revisit the problem of releasing the sample mean of bounded samples in a dataset, privately, under user-level ε-differential privacy (DP). We aim to derive the optimal method of preprocessing data samples, within a canonical class of processing strategies, in terms of the estimation error. […]

On the Optimal Number of Grids for Differentially Private Non-Interactive K-Means Clustering – Data Privacy

Authors: Gokularam M, Anshoo Tandon Know more Differentially private K-means clustering enables releasing cluster centers derived from a dataset while protecting the privacy of the individuals. Non-interactive clustering techniques based on privatized histograms are attractive because the released data synopsis can be reused for other downstream tasks without additional privacy loss. The choice of the […]

Breaking Data Silos: How GDI is Transforming Access to Geospatial Information in India

Authors: Bryan Paul Robert, Mahidhar Chellamani, Jyotirmoy Dutta Know more For years, some of India’s most valuable geospatial datasets remained scattered across government departments, research institutes, or private organizations. They held immense potential to transform logistics, strengthen climate resilience, and support smarter urban planning, but they remained difficult to access, buried in different formats and […]

Adaptive Self-Distillation for Minimizing Client Drift in Heterogeneous Federated Learning

Authors: M. Yashwanth, G. K. Nayak, A. Singh, Y. Simmhan, A. Chakraborty Know more Federated Learning (FL) is a machine learning paradigm that enables clients to jointly train a global model by aggregating the locally trained models without sharing any local training data. In practice, there can often be substantial heterogeneity (e.g., class imbalance) across […]

Privacy-Preserving Data Quality Assessment for Time-Series IoT Sensors

Authors: N. Chakraborty, A. Sharma, J. Dutta. H. D. Kumar Know more This paper proposes a novel framework for automated, objective, and privacy-preserving data quality assessment of time-series data from IoT sensors deployed in smart cities. We leverage custom, autonomously computable metrics that parameterise the temporal performance and adherence to a declarative schema document to […]

Building a Privacy Web with SPIDEr – Secure Pipeline for Information De-Identification with End-to-End Encryption

Authors: N. Chakraborty, A. Tandon, K. Reddy, K. Kirpekar, B. Robert, H. Kumar, A. Venkatesh, A. Sharma Know more Data de-identification makes it possible to glean insights from data while preserving user privacy. The use of Trusted Execution Environments (TEEs) allow for the execution of de-identification applications on the cloud without the need for a […]

Optimal Tree-Based Mechanisms for Differentially Private Approximate CDFs

Authors: V. A. Rameshwar, A. Tandon, and A. Sharma Know more This paper considers the epsilon-differentially private (DP) release of an approximate cumulative distribution function (CDF) of the samples in a dataset. We assume that the true (approximate) CDF is obtained after lumping the data samples into a fixed number K of bins. In this […]

Enhancing MOTION2NX for Efficient, Scalable and Secure Image Inference using Convolutional Neural Networks

Authors: H. Kallamadi, R. Burra, S. Mittal, S. Sharma, A. Venkatesh, A. Tandon Know more This work contributes towards the development of an efficient and scalable open-source Secure Multi-Party Computation (SMPC) protocol on machines with moderate computational resources. We use the ABY2.0 SMPC protocol implemented on the C++ based MOTION2NX framework for secure convolutional neural […]