eBPF

eBPF (extended Berkeley Packet Filter) is a powerful feature in the Linux kernel that allows for the safe execution of user-defined programs within the kernel space. This technology enables functionalities like packet filtering, tracing, and performance analysis without modifying kernel source code or loading custom kernel modules.

In Linux systems, the operating environment is divided into two primary spaces: kernel space and user space. Kernel space is where the operating system's core operates with full access to hardware resources like memory, CPU, and storage. Conversely, user space is where application software runs with restricted hardware access, relying on the kernel to perform low-level operations. eBPF bridges these two spaces by allowing custom code to execute safely within the kernel, providing high-performance data processing capabilities.

Traceable's eBPF Solution

Traceable utilizes eBPF to enhance its monitoring and analysis capabilities. By attaching probes to essential kernel functions involved in network socket transactions—such as open, connect, read, write, and close—Traceable can collect valuable data with minimal overhead. These probes operate based on specific parameters, enabling Traceable to decide dynamically whether to collect data, thus optimizing performance and resource usage.

Note:

Request blocking is not supported with eBPF Traceable agent deployment.

The following diagram shows a high-level flow of how the Traceable's eBPF collection-based solution works:

Note

This topic explains the steps to install the eBPF solution using Helm charts and Terraform. You can also install eBPF on a virtual machine (VM) using the installation script. For more information, see Install eBPF using script topic.

Before you begin

Ensure that the following prerequisites are met to install an eBPF-based Traceable agent.

Linux kernel – The following kernel versions are supported with BTF (BTP Type Format) enabled:
- RHEL 7 and CentOS 7- The underlying Linux kernel should be 3.10.0-1160.76 or later.
- Ubuntu, Debian, and RHEL8 – The underlying Linux kernel should be 4.18 or later.
Kernel build – Linux kernel built with CONFIG_DEBUG_INFO_BTF=y option. To check whether Kernel is built with CONFIG_DEBUG_INFO_BTF=y option, enter the following command and look for CONFIG_DEBUG_INFO_BTF=y option:
```
cat /boot/config-$(uname -r) | grep BTF
```
Capabilities – ⁣SYS_PTRACE and SYS_ADMIN capabilities in Kubernetes. You can check this in Traceable's helm template. A snippet is shown below:
ActionScriptActionScript
```
capabilities:
            add:
            - SYS_PTRACE
            - SYS_ADMIN
```
Traceable agent – Traceable agent 1.19.2 or later.
Traceable access token – In the Traceable platform (UI), navigate to Settings () → Access Token and click Generate Agent Token. Copy the token.
Privileged user – The deployment requires privileged user access. Check step 2 of the Installation section for ebpfRunAsPrivileged:true.
eBPF solution works at the kernel level interception of traffic; therefore, no specific ports need to be opened to install Traceable's agent.

Installation

You have two options for installing the Traceable agent for eBPF. You can either use Helm Chart or Terraform for the installation.

Option 1 - Installation using Helm

Complete the following steps to install the Traceable agent for eBPF using Helm:

Create namespace - Enter the following command to create a separate namespace for Traceable:
ActionScriptActionScript
```
kubectl create namespace traceableai
```

Define values.yaml - Define a sample values.yaml file to install the agent. For example:

ActionScriptActionScript

token: <ACCESS_TOKEN>
environment: <ENVIRONMENT_NAME>
runAsDaemonSet: false
daemonSetMirroringEnabled: true
ebpfCaptureEnabled: true
ebpfRunAsPrivileged: true

(optional) Configure tolerations - You can configure tolerations for eBPF pods by providing the variable ebpfTolerations in values.yaml file above. For more information, see Taints and Tolerations.
YAMLYAML
```
ebpfTolerations:
  - key: "env"
    operator: "Equal"
    value: "prod"
  - key: "your-app"
    operator: "Exists"
```

Run the following command to install the Traceable agent in daemonset mode:

ActionScriptActionScript

helm repo add traceableai https://helm.traceable.ai
helm repo update
helm install --namespace traceableai traceable-agent traceableai/traceable-agent --values values.yaml

Verify that Traceable agent pods are created. Enter the following command:

ActionScriptActionScript

kubectl get pods -n traceableai

for example, the output should be similar to:

NAME                               READY   STATUS    RESTARTS   AGE
traceable-agent-6b87685fb4-ghb58   1/1     Running   0          17m
traceable-ebpf-tracer-ds-2kx2l     1/1     Running   0          55s

Option 2 - Installation using Terraform

Complete the following steps to install the Traceable agent for eBPF using Terraform:

Download - Enter the following command to download the Traceable Platform agent Terraform tarball:
ActionScriptActionScript
```
curl -O https://downloads.traceable.ai/install/traceable-agent/terraform/kubernetes/latest/traceable-agent-tf-k8s.tar.gz
```
Untar and change directory - Enter the following command to untar the tarball and change the directory:
ActionScriptActionScript
```
tar xvzf traceable-agent-tf-k8s.tar.gz
cd traceable-agent-tf-k8s
```
Create namespace - Enter the following command to create a separate namespace for Traceable:
ActionScriptActionScript
```
kubectl create namespace traceableai
```

tfvars file - Create a terraform.tfvarsfile. A sample file is shown below.

ActionScriptActionScript

token                        = ""
endpoint                     = "api.traceable.ai"
environment                  = ""
run_as_daemon_set            = false
daemon_set_mirroring_enabled = true
ebpf_capture_enabled         = true
ebpf_run_as_privileged       = true

(optional) Configure tolerations - You can configure tolerations for eBPF pods by providing the variable ebpf_tolerations in terraform.tfvars file above. For more information, see Taints and Tolerations.
ActionScriptActionScript
```
ebpf_tolerations = [
  {
    key = "your-app",
    operator = "Exists",
    value = null,
    effect = "NoSchedule",
    toleration_seconds = null,
  }
]
```
Explanation:
- ebpf_tolerations: This is a list containing the tolerations for eBPF pods.
- key: The key represents the taint on the node that the toleration is targeting.
- operator: The operator specifies how the toleration should be evaluated. In this example, “Exists” means that as long as the taint key exists on the node, the toleration will be valid.
- value: The value is associated with the taint key, but in this case, it is set to null. This means that any value associated with the taint key is accepted.
- effect: The effect determines which type of taint effect the toleration matches. In this case, the effect is “NoSchedule,” which means that the toleration allows the pod to be scheduled on nodes with the specified taint key.
- toleration_seconds: This field is also set to null. It is used to specify a time duration for which the toleration is valid. In this case, since it's null, there is no specific time limit defined for the toleration.
To summarize, the given code creates an eBPF toleration that allows pods to be scheduled on nodes with the taint key “your-app” and the taint effect “NoSchedule.” The toleration is valid as long as the taint key exists on the node, regardless of the associated value.
Apply - Enter the following command to apply Terraform:
ActionScriptActionScript
```
terraform init
terraform apply
```

Verification- Enter the following command to verify a successful installation.

ActionScriptActionScript

kubectl get pods -n traceableai

Following is an example output of a successful installation:

ActionScriptActionScript

NAME                               READY   STATUS    RESTARTS   AGE
traceable-agent-6b87685fb4-ghb58   1/1     Running   0          17m
traceable-ebpf-tracer-ds-2kx2l     1/1     Running   0          55s

You can also verify a successful installation by navigating to API Catalog → Services and check for ebpf in the traceable.module.name field as shown in the screenshot below.

Independent eBPF Helm deployment

The eBPF tracer can be deployed independently using Helm, separate from the TPA Helm release. This is achieved with the existing traceable-agent Helm charts by setting a key configuration value: ebpfOnly. Following are the Helm configuration values for this deployment:

ebpfOnly — Controls whether only eBPF resources are deployed (true for standalone eBPF).
ebpfReportingEndpoint — Endpoint for reporting eBPF data.
ebpfRemoteEndpoint — Remote endpoint for eBPF communication.
ebpfToTpaTlsEnabled — Enables TLS between eBPF and TPA.
tpaCaBundle — Base64-encoded CA cert for standalone TPA client deployments.
tpaCaCertSecret — Secret containing the TPA CA cert, used in the same namespace as the eBPF DaemonSet.
tpaCaCertFile — Absolute file path for the CA cert injected into the eBPF tracer container.

Example configuration

# eBPF setup
daemonSetMirroringEnabled: true
ebpfCaptureEnabled: true
ebpfRunAsPrivileged: true
ebpfHttp2CaptureEnabled: true
daemonSetMirroring:
  matchSelectors:
    - field_selectors:
      - "metadata.namespace=testgoapp"

# eBPF-only deployment
ebpfOnly: true
ebpfReportingEndpoint: "agent.traceableai:5443"
ebpfRemoteEndpoint: "agent.traceableai:5443"
ebpfToTpaTlsEnabled: true
tpaCaCertSecret:
  secretName: "traceable-agent-cert"
  caCertFileName: "root_ca.crt"

This configuration allows the independent deployment of eBPF while integrating with the Traceable platform, ensuring flexibility and security. For deployments using TLS, ensure to provide the root_ca cert, either as a base64 bundle or as a secret. Once configured, you can deploy the eBPF tracer using:

helm install traceable-ebpf-tracer traceableai/traceable-agent -n traceableai --values values-ebpf-only.yaml

This command creates an independent Helm release for the eBPF tracer.

Enable or disable mirroring

To configure mirroring, go through the following points:

Enable mirroring for all namespaces

Mirroring is disabled by default. To enable mirroring for all namespaces, use the following configuration:

If you are using Helm, then in values.yaml, set - daemonSetMirrorAllNamespaces: true
If you are using Terraform, then in main.tf, set - daemon_set_mirror_all_namespaces = true

Enable mirroring for a namespace

To enable mirroring for a namespace, set the namespace label traceableai-mirror to enabled or enter the following command:

kubectl label ns <namespace> traceableai-mirror=enabled

Disable mirroring for a namespace

To disable mirroring for a namespace, set the namespace label traceableai-mirror to disabled or enter the following command:

kubectl label ns <namespace> traceableai-mirror=disabled

Disable mirroring for a pod

To disable mirroring for a pod, set the pod annotation mirror.traceable.ai/enabled to false.

kubectl patch deployment <deployment> -n <namespace> -p '{"spec": {"template":{"metadata":{"annotations":{"mirror.traceable.ai/enabled":"false"}}}} }'

Set the mirroring mode

By default, only ingress traffic is captured. However, by configuring correct annotations, you can capture only egress traffic or both ingress and egress traffic.

Capture egress traffic

Set the following annotations to capture the egress traffic for a deployment or namespace.

Deployment

To capture the egress traffic, set the deployment annotation mirror.traceable.ai/mode to egress. Enter the following command:

kubectl patch deployment <deployment> -n <namespace> -p '{"spec": {"template":{"metadata":{"annotations":{"mirror.traceable.ai/mode":"egress"}}}} }'

Namespace

To capture egress traffic at the namespace level, set the annotation mirror.traceable.ai/defaultMode to egress. Enter the following:

kubectl annotate namespace <NAMESPACE> mirror.traceable.ai/defaultMode=egress

Capture ingress and egress traffic

Set the following annotations to capture both ingress and egress traffic for a deployment or namespace.

Deployment

To capture ingress and egress traffic for a deployment, set the deployment annotation mirror.traceable.ai/mode toingress_and_egress. Enter the following command:

kubectl patch deployment <deployment> -n <namespace> -p '{"spec": {"template":{"metadata":{"annotations":{"mirror.traceable.ai/mode":"ingress_and_egress"}}}} }'

Namespace

To capture the ingress and egress traffic at a namespace level, set the annotation mirror.traceable.ai/defaultMode to ingress_and_egress. Enter the following command:

kubectl annotate namespace <NAMESPACE> mirror.traceable.ai/defaultMode=ingress_and_egress

Capture pod label

You can capture specific pod labels in your traces, providing deeper insights into your Kubernetes deployments.

To utilize this, you need to configure the Helm value ebpfPodLabels, where you can specify the pod label keys that should be captured as span attributes. For instance, if your pods have labels like name: foobar and version: 1.19.0, and you want to capture the name label, you would configure it as follows:

ebpfPodLabels:
  - name

After generating some traffic, the captured label will appear on the Platform as k8s.pod.label.name. If your pod label includes a more complex key structure, such as app.kubernetes.io/name, it will be displayed as k8s.pod.label.app.kubernetes.io/name.

Logs

eBPF package allows you to customize various configurations for eBPF logs. These configurations are part of ebpf-tracer-config.yaml file. Following is a snippet of the logging configurations:

logging:
  level: info
  encoding: "json"
  output_paths:
    - stdout
  error_output_paths:
    - stderr
  log_rotation:
    enabled: false
    filename: "tracer.log"
    filepath: "/var/traceable/log/ebpf-tracer"
    max_size: 10 # megs
    max_backups: 10

The following table explains the configurations:

Parameter	Description
`level`	Defines the log's logging level. The values can be `trace`, `debug`, `info`, `warn`, or `error`. The default value is `info`.
`encoding`	Encoding sets the logger's encoding. Valid values are `json` or `console`.
`output_paths`	The path to which output is sent. The default value is `stdout`.
`error_output_paths`	The stream to which error logs are printed. The default value is `stderr`.

Log rotation

The following table explains explicitly the configurations for log rotation from the above snippet:

Parameter	Description
`enabled`	Defines whether you wish to enable or disable log rotation. The default value is `false`.
`filename`	The name of the file to print the logs. The default value is `tracer.log`.
`filepath`	The path to save the log file.
`max_size`	The file size in megabytes, after which the log file is rotated.
`max_backups`	The number of previous backups that are retained.

Upgrade

You can upgrade the Traceable agent in Kubernetes using the following Helm commands:

Update helm charts by entering the following command:
ActionScript
ActionScript
```
helm repo update traceableai
```
Enter the following command to upgrade the Traceable agent to the latest version:
ActionScript
ActionScript
```
helm upgrade traceable-agent --namespace traceableai traceableai/traceable-agent
```

Uninstall

Enter the following command to uninstall the Platform agent using Helm:

ActionScript

helm uninstall traceable-agent --namespace traceableai

Troubleshooting

Troubleshooting for eBPF starts with collecting container logs. Enter the following command to collect the logs:

kubectl logs -n traceableai traceable-agent-bzxts traceable-ebpf-tracer

Following are a few of the steps that you can take to troubleshoot eBPF issues:

Verify correct configuration

The current configuration is part of the logs. As soon the configuration is parsed, it is available in the logs. A sample log entry of configuration is shown below:

time="2022-07-18T09:39:56Z" level=info msg="config log_level:{value:\"info\"} proc_fs_path:{value:\"/hostproc\"}
unix_domain_socket_addr:{value:\"/var/log/sock/eve.json\"}perfmap_queue_size:{value:1024} uds_event_queue_size:
{value:10000} probe_event_queue_size:{value:50000}capture_all_namespaces:{} k8s_enabled:{value:true}
mode:{value:\"all\"} max_active_ret_probe:{value:1}"

Error related to BTF (BPF Type Format) not found

You are not likely to encounter this error in the latest Linux kernels, as most of them have vmlinux file in the /boot directory. The /boot directory contains the debug information that is required to run the eBPF program. The other Linux kernels which do not have vmlinux file, Traceable ships BTF files in the eBPF container. These BTF files are available on the Traceable's download site.

The ebpf-tracer first checks for vmlinux file. It then checks for BTF file locally based on the OS information. If the ebpf-tracer does not find the file, it then downloads it from the download site. If all these steps fail, reach out to Traceable support with the OS details from the log files. The OS information is available in the logs when vmlinux is not found.

time="2022-07-18T09:39:57Z" level=info msg=system info {"sysinfo":{"version":"0.9.5","timestamp":"2022-07-18T10:22:37.753642303Z"},
"node":{"hostname":"sant","machineid":"4b4d738cd6864265b10089357502600c",
"hypervisor":"vmware","timezone":"Etc/UTC"},"os":{"name":"Ubuntu 18.04.6 LTS",
"vendor":"ubuntu","version":"18.04","release":"18.04.6","architecture":"amd64"},
"kernel":{"release":"4.19.0-041900-generic","version":"#201810221809 SMP Mon Oct 22 22:11:45 UTC 2018",
"architecture":"x86_64"},"product":{"name":"VMware Virtual Platform",
"vendor":"VMware, Inc.","version":"None","serial":"VMware-56 4d c7 1f 27 9f 91 58-0c af 0a e1 90 79 28 bb"},
"board":{"name":"440BX Desktop Reference Platform","vendor":"Intel Corporation","version":"None","serial":"None"},
"chassis":{"type":1,"vendor":"No Enclosure","version":"N/A","serial":"None",
"assettag":"No Asset Tag"},"bios":{"vendor":"Phoenix Technologies LTD","version":"6.00","date":"11/12/2020"},
"cpu":{"vendor":"GenuineIntel","model":"Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz","speed":2600,"cache":12288,"threads":2},"memory":{"type":"DRAM","size":4096},
"storage":[{"name":"sda","driver":"sd","vendor":"VMware,","model":"VMware Virtual S","size":21}],
"network":[{"name":"ens33","driver":"e1000","macaddress":"00:0c:29:79:28:bb","port":"tp","speed":1000}]}

Check if pods are being tracked

For each pod, the logs contain the following information:

time="2022-07-18T09:39:57Z" level=info
msg="Added pod to maps. {\"Name\":\"linkerd-proxy-injector-6848fbbc4-hs4wj\",
\"Namespace\":\"linkerd\",\"Service\":\"linkerd-proxy-injector.linkerd\",\"Enabled\":false,\"Mode\":0}"

Name : name of pod
Namespace
Service: service to which it belongs
Enabled: true if ebpf-tracer is tracking this pod
Mode: 0 for ingress and 1 for egress

Check Statistics

You can check statistics if the requests are being parsed or requests are getting drooped.

time="2022-07-18T09:41:57Z" level=info
msg="stats {\"ControlEventReceived\":0,\"DataEventReceived\":0,\
"TotalRequestsParsed\":0,\
"ReqParsingErrors\":0,\
"ResParsingErrors\":0,\
"EventLost\":0,\
"TotalRequestsSent\":0,\
"TotalEventsDroppedAtEventQueueLimit\":0,\
"TotalEventsDroppedAtParsing\":0,\
"KprobeEventMaxQueueSizeTillNow\":1}"

ControlEventReceived: kprobe received from kernel for accept, connect and close calls. KProbes is a debugging mechanism for the Linux kernel which can also be used for monitoring events inside a production system. You can use it to find out performance bottlenecks, log specific events, tracing problems, and so on.
DataEventReceived: [k/u]probe (kprobes and uprobes) received from kernel with data (HTTP).
TotalRequestsParsed: Total requests parsed successfully.
ReqParsingErrors: Errors occurred during parsing of requests.
ResParsingErrors: Errors occurred during parsing of responses.
EventLost: Events lost during read from perf buffers. This happens when event consumption is slower than the event produced in the kernel.
TotalRequestsSent: Total Spans queued for sending to the Traceable Platform Agent.
TotalEventsDroppedAtEventQueueLimit: ebpf-tracer maintains a queue of events for parsing, if this count is increasing means that ebpf-tracer needs more CPUs.
TotalEventsDroppedAtParsing: Number of events dropped during parsing of data. This can occur due to out of order events.

Check Probe Statistics

time="2022-07-18T09:42:57Z" level=info msg="probe stat {\"read\":[224247,224209],\"recvfrom\":[2387,2384],\"recvmmsg\":[26,26],
\"recvmsg\":[1106,1103],\"sendmsg\":[570,560],\"sendto\":[80,80],\"write\":[21898,21850],\"writev\":[682,682]}"

The above shown statistics list the entry and exit probe executed for a function. In some environments, you may see that Linux sometimes chooses not to execute the return probe. A possible reason could be a configuration on Linux to execute a number of parallel return probes. These parallel numbers of return probes is equal to the number of default CPUs. This is sometimes not sufficient and causes drops in return probes. If you see a large difference between the two counts of each call, then set the max_active_ret_probe in config to a higher value (10 times the number of CPUs). This setting is also available in the helm charts and terraform deployment of Traceable's Platform Agent.