Data Classification
  • 27 Dec 2022
  • 4 Minutes to read
  • PDF

Data Classification

  • PDF

Article Summary

Traceable provides you with an option to identify and classify standard and custom data types.  The specific fields are classified into the datatypes based on their content. Datatypes are then grouped into the DataSets based on common usage, degree of data sensitivity, and relevant compliance regulations.   

Datatype is a well define type of sensitive data, for example, IP address, driving license, VIN and so on. Traceable identifies these datatypes by built-in or custom rules. 

In most situations, you should be able to use predefined datatypes to identify data flows within your application. If needed, predefined datatypes can be refined by adding or excluding specific fields from being identified as a datatype. For example, if in most cases, 'Name' parameter identifies a first or a last name of a physical person, but you have some APIs that use the word 'Name' to identify a name of a table, you can exclude that parameter from being identified as a 'Name' type by adding an 'Ignore' rule.

The easiest way to customize the definition of existing datatypes is from the API DNA page.

Note

Any changes you make to either existing or custom datatype will take effect once Traceable sees new data (request or responses) matching the new criteria.

The decision to suppress certain data values before Traceable AI platform processes the traffic for anomalies is done at the DataSet level. So is configuration for the level of sensitivity. You are allowed to edit both sensitivity and suppression parameters on the default DataSet to match your business requirements.

For data that are specific to your business, you can configure custom datatypes and DataSets.  Navigate to Administration (image-1638268402925) > Data Classification and click on + Add Dataset button to add a Dataset. You can also add custom datatypes to the default DataSet as long as the new datatypes have the same level of sensitivity, and you wish them to be treated similar to other datatypes in that DataSet from the data suppression standpoint.

To create meaningful rules, you must have reasonable knowledge of regular expressions. You can use a website, like, regex101 to view and learn regular expressions. Key and value regular expressions should be Google RE2 compliant.

Note

You can also create a sensitive datatype directly from the API DNA tab in the API Endpoint details page. For more information, see API DNA.

Step 1 – Create Dataset

As step 1, create a Dataset. After defining the Dataset, create datatype rules. 

You can choose the Sensitivity of the Dataset from:

  • Low
  • Medium
  • High
  • Critical

When you enable Suppress Data and choose from Redact or Obfuscate option, the Traceable agent sends the data in redacted or obfuscated form from your environment to Traceable.

By default, all the datatypes in the DataSet will be color-coded based on the DataSet sensitivity when they appear in API Catalog and other Traceable dashboard. If you wish, you can define a custom color for some specific DataSets so that it is easier to distinguish visually. 

Step 2 – Add Datatype

Click on +Add New Datatype to add datatypes to the Dataset. The same datatype can be assigned to multiple DataSets. For example, datatype Credit-Card can be a member of both banking information and PCI DSS In-scope DataSets.

To define a new datatype, enter the following information:

  • Datatype – Enter the name of the datatype. A list of existing Datatypes is shown; you can pick an existing datatype or create a new one. The name of datatype must be unique. You can also choose from the displayed list of datatypes.
  • Description – This is optional, but it is a good practice to provide a description for easy reference in the future.

Every datatype is defined by a combination of one or more rules with different scope. The rules are evaluated in order. You can change the order of the rules by dragging them up or down in the configuration window. The evaluation stops at the first match. Each API parameter can have no more than one datatype assigned.

  • Scope – Choose the scope from System wide and Environment-specific. When you select System wide, the datatype rule applies to all the APIs. When you select Environment, the datatype rule applies only to the APIs in the specific Environment.
  • Match Type – You can select the match type either as Match or Ignore. When you select Match, the Datatype rule will stamp all the data that matches the rule with the current datatype. However, if you select Ignore, the datatype rule will ignore all the data that matches the rule from being stamped with the current datatype. This can be considered as a negative match rule.
  • Location – You can choose the exact location of the span where this rule should apply, for example, query parameter, request, or response header and so on. You can also select Any Location, in which case the entire span is monitored.
  • Key – Providing the Key is mandatory. You can select the key as an exact match, or you can provide a regular expression to match the data. The name of the key is used as a first filter to identify if it may contain sensitive data.
  • Value (optional) – Matching value as a part of datatype rule definition is optional, but it helps to make the datatype detection more accurate and avoid false positives. Like key, you can provide an exact match or a regular expression.

You can add one or more than one rule to the datatype. If you have created more than one rule, then the rules follow the OR operation and stops when the first rule matches. You can change the order of rule by dragging and dropping the rule.


Was this article helpful?