AIAnonymizer

Development > Component reference > AI Components > AIAnonymizer

Short description

Ports

Metadata

AIAnonymizer attributes

Compatibility

Short description

The AIAnonymizer lets you run a token classification model (preferably a PII detection model such as Piiranha - see CloverDX Marketplace) and mask identified tokens in the output.

This component behaves just like AITokenClassifier but with the added functionality of masking (anonymizing) tokens identified by the model above configured thresholds.

Same input metadata	Sorted inputs	Inputs	Outputs	Each to all outputs	Java	CTL	Auto-propagated metadata
-	⨯	1	1	⨯	⨯	⨯	✓

Same input metadata

Sorted inputs

Inputs

Outputs

Each to all outputs

Java

CTL

Auto-propagated metadata

⨯

✓

Ports

Port type Number Required Description Metadata

Port type	Number	Required	Description	Metadata
Input	1	✓	The text(s) to classify	At least one `string` field
Output	1	✓	Copy of the input data with anonymized texts + token classification result	Any

Input

✓

The text(s) to classify

At least one string field

Output

✓

Copy of the input data with anonymized texts + token classification result

Any

Metadata

AIAnonymizer propagates input metadata to output.

AIAnonymizer attributes

Attribute Req Description Possible values

Attribute	Req	Description	Possible values
Model
Server model		Recommended: Use a model installed as a library on the CloverDX Server. Check CloverDX Marketplace for available ready-to-use models. This is a more convenient alternative to Classification model directory.
Classification model directory		Path to the machine learning model directory. It is required unless Server model is defined.
Model name	no	A read-only field displaying name defined in model configuration files (if available).
Device	yes	The device to run the model – either processor (CPU) or graphics card (GPU). You must set the device the model is designed for. GPU models are much faster but you need a specialized hardware to use them.	CPU (default) \| GPU
Model arguments	no	Configuration arguments for the model. See documentation of your particular model.
Tokenizer arguments	no	Configuration arguments for the tokenizer. See documentation of your particular model.
Translator arguments	no	Configuration arguments for the translator. See documentation of your particular model.
Input / output parameters
Fields to anonymize	yes	List of `string` fields to be classified and anonymized.
Anonymize classes and thresholds		List of token classes who shall be anonymized. The classes are model-dependent; you can use only some of them, but you cannot add classes unknown to the model. The thresholds define the minimum score at which the particular token is anonymized – it is masked if at least one class reaches its threshold.
Mask character	no	The character used for masking characters of the anonymized tokens.
Anonymization information	no	An output field which will store the analysis results. It must be of variant type. If the field already contains some analysis, the analyses are merged, so that you can concatenate several AI components and use their combined output.
Batch size	no	Number of records processed by model together.	an integer number
Error handling
Token overflow policy	no	Specifies behavior when some input text cannot be encoded because it exceeds the model-specific maximum length. The strict policy causes the component to fail while lenient just logs a warning and truncates the input.	strict (default) \| lenient

Model

Server model

Recommended: Use a model installed as a library on the CloverDX Server. Check CloverDX Marketplace for available ready-to-use models. This is a more convenient alternative to Classification model directory.

Classification model directory

Path to the machine learning model directory. It is required unless Server model is defined.

Model name

A read-only field displaying name defined in model configuration files (if available).

Device

yes

The device to run the model – either processor (CPU) or graphics card (GPU). You must set the device the model is designed for. GPU models are much faster but you need a specialized hardware to use them.

CPU (default) | GPU

Model arguments

Configuration arguments for the model. See documentation of your particular model.

Tokenizer arguments

Configuration arguments for the tokenizer. See documentation of your particular model.

Translator arguments

Configuration arguments for the translator. See documentation of your particular model.

Input / output parameters

Fields to anonymize

yes

List of string fields to be classified and anonymized.

Anonymize classes and thresholds

List of token classes who shall be anonymized. The classes are model-dependent; you can use only some of them, but you cannot add classes unknown to the model. The thresholds define the minimum score at which the particular token is anonymized – it is masked if at least one class reaches its threshold.

Mask character

The character used for masking characters of the anonymized tokens.

Anonymization information

An output field which will store the analysis results. It must be of variant type. If the field already contains some analysis, the analyses are merged, so that you can concatenate several AI components and use their combined output.

Batch size

Number of records processed by model together.

an integer number

Error handling

Token overflow policy

Specifies behavior when some input text cannot be encoded because it exceeds the model-specific maximum length. The strict policy causes the component to fail while lenient just logs a warning and truncates the input.

strict (default) | lenient

Compatibility

Version	Compatibility notice
7.1.0	AIAnonymizer is available since CloverDX version 7.1.

Version

Compatibility notice

7.1.0

AIAnonymizer is available since CloverDX version 7.1.

AIAnonymizer

Short description

Ports

Metadata

AIAnonymizer attributes

Compatibility

See also