Differential Privacy query tips and examples
AWS Clean Rooms Differential Privacy uses a general-purpose query structure to support a wide
variety of SQL constructs such as Common Table Expressions (CTEs) for data preparation and
commonly used aggregate functions such as COUNT
, or SUM
. In order to
obfuscate the contribution of any possible user in your data by adding noise to aggregate
query results at run-time, AWS Clean Rooms Differential Privacy requires that aggregate functions in the
final SELECT statement
are run on user-level data.
The following example uses two tables named socialco_impressions
and
socialco_users
from a media publisher who wants to protect data using
differential privacy while collaborating with an athletic brand with
athletic_brand_sales
data. The media publisher has configured the
user_id
column as the user identifier column while enabling differential
privacy in AWS Clean Rooms. The advertiser does not need differential privacy protection and wants to
run a query using CTEs on combined data. Since their CTE uses differential privacy protected
tables, the advertiser includes the user identifier column from those protected tables in the
list of CTE columns and joins the protected tables on the user identifier column.
WITH matches_table AS( SELECT si.user_id, si.campaign_id, s.sale_id, s.sale_price FROM socialco_impressions si JOIN socialco_users su ON su.user_id = si.user_id JOIN athletic_brand_sales s ON s.emailsha256 = su.emailsha256 WHERE s.timestamp > si.timestamp UNION ALL SELECT si.user_id, si.campaign_id, s.sale_id, s.sale_price FROM socialco_impressions si JOIN socialco_users su ON su.user_id = si.user_id JOIN athletic_brand_sales s ON s.phonesha256 = su.phonesha256 WHERE s.timestamp > si.timestamp ) SELECT COUNT (DISTINCT user_id) as unique_users FROM matches_table GROUP BY campaign_id ORDER BY COUNT (DISTINCT user_id) DESC LIMIT 5
Similarly, if you want to run window functions on differential privacy protected data
tables, you must include the user identifier column in the PARTITION BY
clause.
ROW_NUMBER() OVER (PARTITION BY conversion_id, user_id ORDER BY match_type, match_age) AS row