- 09 Oct 2024
- 8 Minutes to read
- Print
- PDF
Data Classification
- Updated on 09 Oct 2024
- 8 Minutes to read
- Print
- PDF
Data Classification is the process of identifying, categorizing, and labeling data handled by APIs. This process is carried out based on the sensitivity, nature of data, and is used for monitoring, securing, and managing API interactions. It consists of three components:
Datatypes — This is the classification of data based on its nature, sensitivity, pattern, or specific rules.
Datasets — This is a group of datatypes based on their common usage, sensitivity, or relevant compliance regulations.
Overrides — This is the ability to manually adjust data handling according to your requirements.
Actions on the Data Classification page
Traceable automatically scans the data being processed by APIs and categorizes them into pre-defined datatypes and datasets. However, you can do the following on the Data Classification page:
Add user-defined datatypes, datasets, and overrides.
Define custom classification rules in datatypes to meet specific business needs.
Define sensitivity and suppression configurations for each user-defined datatype.
Define classification rules and scope them to specific URLs or regex patterns.
Define rules to relax data suppression on values received from APIs, based on certain conditions, such as for a particular environment you want to use for AST scans.
Define rules to Redact or Obfuscate values across environments according to your requirements.
Apart from the above, Traceable also allows you to:
Assign multiple datatypes to the same dataset. For example, the credit card number and bank name datatypes can be assigned to the banking information dataset.
Note
Dataset is nothing but a label attached to the datatypes.
Assign one datatype to multiple datasets. For example, the credit card number datatype can be assigned to both the banking information and PCI DSS (in scope) datasets.
The above features ensure the following:
Relevant data is classified based on your defined rules.
Sensitive data is protected according to your business needs.
Enhanced security and compliance for your APIs.
Adequate protection of sensitive and personal information being processed by the APIs.
For more information on the components in the Data Classification page, see the following sections.
Datatypes
Datatypes are well-defined types of sensitive data, such as, IP address, driving license, VIN, etc. Traceable identifies these datatypes by built-in or custom rules and allows you to classify them according to your requirements. The specific fields are classified into the datatypes based on their values. For example, add a user-defined datatype to classify IPv4 addresses. Based on your custom rules, Traceable identifies the IP address values and classifies them.
In most situations, you should be able to use pre-defined datatypes to identify data flows within your application. If needed, pre-defined datatypes can be refined by adding or excluding specific fields from being identified as a datatype. For example, if, in most cases, the Name parameter identifies the first or last name of a physical person, but you have some APIs that use the word Name to identify the name of a table, you can exclude that parameter from being identified as a Name type by adding an Ignore rule.
Note
Any changes to either existing or user-defined datatype take effect once Traceable observes new data (request or responses) matching the new criteria.
You can also create a sensitive datatype directly from the API DNA section on the API Endpoint details page. For more information, see API DNA.
Traceable allows you to add user-defined datatypes according to your requirements. You can define each datatype by combining one or more rules with different scopes. The rules are evaluated in order. You can also change the order of rules by dragging them up or down in the configuration window. Traceable evaluates these rules while considering the value (Match or Exclude) selected in the Rule type drop-down and stops at the first match. While adding a datatype, you can also do the following:
Select a sensitivity for the datatype.
Select the dataset(s) to which the datatype should belong.
Suppress data values before Traceable processes the traffic for anomalies.
To add a datatype, navigate to Settings () → Data Classification → Datatypes tab, click on + Add Datatype, and complete the following steps:
Specify the datatype Name.
(Optional) Specify the datatype Description.
Specify the Match Rules:
(Optional) Specify the rule name.
Select the Environments to which the rule should apply. By default, the rule applies to all environments. When you select an environment, the datatype rule applies only to the APIs in that environment.
Select the Rule Type: Match or Exclude.
Match — If selected, Traceable checks for a match in the selected Location(s), and if found, it stamps the relevant data with the current datatype and stops further evaluation.
Exclude — If selected, Traceable checks for a match in the selected Location(s). If found, it does not stamp any data with the current datatype and stops further evaluation.
You can also combine these rules as needed. For example, add an exclude rule followed by a match rule to stamp all IP addresses except those starting with 123.
Select the Location of the span where the rule should apply, such as, request header, response body, and so on. By default, the rule applies to all locations.
Select the Key criteria from the drop-down and provide the key that Traceable should match. You can select the criteria as an exact match, or as a regular expression and specify the value in the field accordingly. The key name is used as the first filter to identify if it contains sensitive data.
Select the Value criteria from the drop-down. By default, Traceable selects Any Value as the criteria. However, similar to Key, you can select the criteria as an exact match or as a regular expression and specify the value in the field accordingly.
(Optional) Click + Add Rule corresponding to Match Rules to add multiple rules. To specify the match rules, see the above step.
Select the Sensitivity of the datatype.
(Optional) Select the Dataset(s) to which the datatype should belong.
Select the Data Suppression method. The Traceable agent sends the data from your environment to the platform based on your selection.
Click Save.
Traceable shows the created datatype in the Datatypes tab. Post-creation, you can also do the following:
Click the Status toggle corresponding to a datatype to enable or disable it.
Click the Ellipse () icon corresponding to a datatype to edit or delete it.
Example
The following demo shows how to create a datatype to hash card numbers across environments. To add a datatype, navigate to Settings () → Data Classification → Datatypes tab.
Datasets
Datasets are groups or collections of similar or related datatypes processed in API requests and responses. Traceable allows you to configure user-defined datasets according to your requirements. While the user-defined datatypes can remain independent of datasets, you can assign the datatypes to user-defined datasets. Based on these datasets and the analysis of data received from your application, Traceable provides a chart in the summary section of the Sensitive Data page under Catalog.
To create a dataset, navigate to Settings () → Data Classification → Datasets tab, click on + Add Dataset, and complete the following steps:
Specify a Name for the dataset.
(Optional) Specify a Description for the dataset.
(Optional) Select an Icon for the dataset.
Click Save.
Traceable shows the dataset in the Datasets tab. You can click the Ellipse () icon corresponding to a dataset to edit or delete it. You can also navigate to the Datatypes tab to create a datatype and assign the dataset to it. Alternatively, you can edit existing datatypes and assign the dataset.
Example
The following demo shows how you can create a PCI DSS compliance dataset that can contain payment-related datatypes. To create a dataset, navigate to Settings () → Data Classification → Datasets tab.
Overrides
Overrides refer to customized data handling in Traceable. These are useful in the following situations:
When you want to run AST scans on raw values, which refer to the unprocessed, non-obfuscated, and non-redacted values transmitted through the API, the Traceable agent suppresses sensitive data before sending it to the platform. In scenarios where data is suppressed, AST scans are not possible. For more information on running the scans, see API Security Testing.
When you want to redact or obfuscate values in specific environments.
Traceable allows you to set up overrides based on certain conditions that meet your requirements. To create an override, navigate to Settings () → Data Classification → Overrides tab, click on + Add Override, and complete the following steps:
Specify a Name for the override.
(Optional) Specify a Description for the override.
From the Scope drop-down, select the environment where the override should apply.
Select how Traceable should transform the data from the Transformation drop-down as part of the override. By default, the Traceable agent sends data to the platform using the suppression defined for a datatype. However, if you select Use raw values from the drop-down, the agent sends raw values for use in AST scans. If you select Redact or Obfuscate values, Traceable transforms the values according to the specified criteria.
Note
This Transformation takes precedence over the Data Suppression method specified in a user-defined datatype, if there is a match.
Click on + Add Condition and do the following:
Select the Attribute Key criteria from the drop-down and provide the key that Traceable should check either as an exact match or as a regular expression. For example, add an attribute key and value to receive raw data from a specific IP address.
The attribute keys from individual results are in the Endpoint Traces and Spans tab under Analytics. For more information on these traces and spans, see Traces.
Note
You can use all attribute keys except for servicename.
Select the Value criteria from the drop-down. By default, Traceable checks if the key value Exists. However, similar to Key, you can select the criteria as an exact match or a regular expression.
Click Save.
Traceable shows the configured overrides in the Overrides tab. You can click the Ellipse () icon corresponding to an override to view, edit, or delete it.
Example
The following demo shows how you can create an override to use raw values for a specific environment. To create an override, navigate to Settings () → Data Classification → Overrides tab.