+34 900 840 407
support@cytomic.ai

How to build searches compatible with the Cytomic Data Watch normalization process?

Related Products_
  • Cytomic Data Watch
Separating characters_

Cytomic Data Watch identifies a group of special characters that it considers as separators between words and which can be completely removed or replaced by a single space. These characters are as follows:

  • Return: \r
  • Line break: \n
  • Tab key: \t
  • Characters: ” : ; ! ? – + _ * = ( ) [ ] { } , . | % \ / ?
Transformation of characters_

Regardless of whether the character string is recognized as a PII type or not, before it is stored in the database, it is transformed to lowercase. Administrator searches are also transformed to lowercase, so writing in uppercase or lowercase does not affect the search result.

General rules for normalizing data recognized as personal data_
  • In PII types formed by numeric characters (telephone numbers, bank account numbers, etc.) separating characters are deleted and the resulting string is stored as a single entity. For example “1.42.65.116-C” would be stored as PII type IDCARD “14265116C”.
  • IP addresses and email addresses are stored as they are.
  • For First Names and Last Names and Addresses, each word is stored independently and those containing numbers are deleted. For example “25 Upper Nelson Mandela Boulevard? would be stored as “upper”, “nelson”, “mandela”, “boulevard?.
    General rules for normalizing data not recognized as personal data
    Numerical and alphanumeric data (words formed by letters and numbers) that are not detected as PII are deleted in the normalization process, and therefore they do not return any results in searches.
    Each separating character detected divides the character string into two independent words and means that the separator character is not stored. For instance, the string “house.forest” is stored as “house” and “forest” and the separator character “.” is deleted.
General rules for normalizing data not recognized as personal data_
  • Numerical and alphanumeric data (words formed by letters and numbers) that are not detected as PII are deleted in the normalization process, and therefore they do not return any results in searches.
  • Each separating character detected divides the character string into two independent words and means that the separator character is not stored. For instance, the string “house.forest” is stored as “house” and “forest” and the separator character “.” is deleted.
Tips for constructing searches that are compatible with the normalization process_
+34 900 840 407
support@cytomic.ai