Function Repository Resource:

PseudonymizeData

Source Notebook

Consistently replace sensitive data values with UUIDs

Contributed by: Jon McLoone

ResourceFunction["PseudonymizeData"][table]

replaces strings at level 2 in table with UUIDs such that repeated values are replaced with the same UUID.

ResourceFunction["PseudonymizeData"][table,partspec]

replaces data in table with UUIDs such that repeated values are replaced with the same UUID at positions specified by partspec.

Details

Pseudonymization does not guarantee privacy for personal information. If sufficient information is associated with pseudonymous IDs, then a determined attacker may be able to establish connections between the IDs and original information.

Examples

Basic Examples (2) 

Replace strings with UUID pseudonyms:

In[1]:=
ResourceFunction[
 "PseudonymizeData"][{{"Jon", 1}, {"Mike", 20}, {"Jon", 2}, {"Mike", 30}}]
Out[1]=

Replace the first column with UUID pseudonyms:

In[2]:=
ResourceFunction[
 "PseudonymizeData"][{{"Jon", 1}, {"Mike", 20}, {"Jon", 2}, {"Mike", 30}}, {All, 1}]
Out[2]=

Scope (4) 

You can pseudonymize multiple columns and shared values will share UUIDs across columns:

In[3]:=
ResourceFunction["PseudonymizeData"][{
   {"Jon", "Mary", 1},
   {"Mike", "Mary", 20},
   {"Jon", "Kate", 2},
   {"Mike", "Kate", 30}
  } , {All, {1, 2}}] // TableForm
Out[3]=

PseudonymizeData can be applied to a Dataset:

In[4]:=
ResourceFunction["PseudonymizeData"][Dataset[ {
   {"Jon", 1},
   {"Mike", 20},
   {"Jon", 2},
   {"Mike", 30}
  } ], {All, 1}]
Out[4]=

If the data is a list of associations, you can use the key as an index:

In[5]:=
ResourceFunction["PseudonymizeData"][Dataset[{
   <|"Name" -> "Jon", "Value" -> 1|>, <|"Name" -> "Mike", "Value" -> 20|>, <|"Name" -> "Jon", "Value" -> 2|>, <|"Name" -> "Mike", "Value" -> 30|>
   }], {All, "Name"}]
Out[5]=

If the first row contains headers you can apply pseudonymization from the second row onwards using Span:

In[6]:=
ResourceFunction["PseudonymizeData"][{
   {"Name", "Value"},
   {"Jon", 1},
   {"Mike", 20},
   {"Jon", 2},
   {"Mike", 30}
  } , {2 ;;, 1}] // TableForm
Out[7]=

Publisher

Jon McLoone

Version History

  • 1.0.0 – 29 January 2021

Related Resources

License Information