- 19 Oct 2023
- 11 Minutes to read
- PDF
eBPF
- Updated on 19 Oct 2023
- 11 Minutes to read
- PDF
eBPF is a virtual machine and a set of libraries that enable the creation of in-kernel programs, known as eBPF programs. eBPF programs can be used for various purposes, such as packet filtering, tracing, and performance analysis. eBPF is implemented in the Linux kernel and is available in recent versions of the Linux operating system. Linux divides its memory space into two areas, kernel space and user space. Kernel space is where the core of the operating system resides. The kernel space has unrestricted access to all the hardware, that is, memory, CPU, storage, and so on. The user space is the space where user applications are run. The user space code has limited direct access to hardware and relies on kernel space to complete its operation. eBPF is a mechanism for writing code that can be executed in the Linux kernel space. eBPF allows you to package the user space application logic to be executed in the Linux kernel space as bytecode.
Traceable provides an eBPF solution that attaches probes to kernel functions and collects the data. The probes are attached to the functions which are executed during any network socket transaction, like open, connect, read, write, and close calls. Based on parameters, Traceable decides whether to collect data or not.
The following diagram shows a high-level flow of how the Traceable's eBPF collection-based solution works:
Before you begin
Make sure that the following prerequisites are met to install eBPF based Traceable agent.
- Linux kernel – The following kernel versions are supported with BTF (BTP Type Format) enabled:
- RHEL 7 and CentOS 7- The underlying Linux kernel should be 3.10.0-1160.76 or later.
- Ubuntu, Debian, and RHEL8 – The underlying Linux kernel should be 4.18 or later.
- Kernel build – Linux kernel built with
CONFIG_DEBUG_INFO_BTF=y
option. To check whether Kernel is built withCONFIG_DEBUG_INFO_BTF=y
option, enter the following command and look forCONFIG_DEBUG_INFO_BTF=y
option:cat /boot/config-$(uname -r) | grep BTF
- Capabilities –
SYS_PTRACE
andSYS_ADMIN
capabilities in Kubernetes. You can check this in Traceable's helm template. A snippet is shown below:ActionScriptcapabilities: add: - SYS_PTRACE - SYS_ADMIN
- Traceable agent – Traceable agent 1.19.2 or later.
- Traceable access token – In Traceable platform (UI), navigate to Administration (
) > Access Token and click on Generate Agent Token. Copy the token.
- Privileged user – The deployment requires privileged user access. Check step 2 of the Installation section for
ebpfRunAsPrivileged:true
. - eBPF solution works at the kernel level interception of traffic, and therefore, no specific ports need to be opened for installing Traceable's agent.
Installation
You have two options to install Traceable agent for eBPF. You can either use Helm chart or Terraform for the installation.
Option 1 - Installation using Helm
Complete the following steps to install Traceable agent for eBPF using Helm:
- Create namespace - Enter the following command to create a separate namespace for Traceable:ActionScript
kubectl create namespace traceableai
- Define values.yaml - Define a sample
values.yaml
file to install the agent. For example:ActionScripttoken: <ACCESS_TOKEN> environment: <ENVIRONMENT_NAME> runAsDaemonSet: false daemonSetMirroringEnabled: true ebpfCaptureEnabled: true ebpfRunAsPrivileged: true
- (optional) Configure tolerations - You can configure tolerations for eBPF pods by providing the variable
ebpfTolerations
invalues.yaml
file above. For more information, see Taints and Tolerations.YAMLebpfTolerations: - key: "env" operator: "Equal" value: "prod" - key: "your-app" operator: "Exists"
- Run the following command to install Traceable agent in daemonset mode:ActionScript
helm repo add traceableai https://helm.traceable.ai helm repo update helm install --namespace traceableai traceable-agent traceableai/traceable-agent --values values.yaml
- Verify that Traceable agent pods are created. Enter the following command:ActionScript
kubectl get pods -n traceableai
for example, the output should be similar to:
NAME READY STATUS RESTARTS AGE traceable-agent-6b87685fb4-ghb58 1/1 Running 0 17m traceable-ebpf-tracer-ds-2kx2l 1/1 Running 0 55s
Option 2 - Installation using Terraform
Complete the following steps to install Traceable agent for eBPF using Terraform:
- Download - Enter the following command to download the Traceable Platform agent Terraform tarball:ActionScript
curl -O https://downloads.traceable.ai/install/traceable-agent/terraform/kubernetes/latest/traceable-agent-tf-k8s.tar.gz
- Untarand change directory - Enter the following command to untar the tarball and change directory:ActionScript
tar xvzf traceable-agent-tf-k8s.tar.gz cd traceable-agent-tf-k8s
- Create namespace - Enter the following command to create a separate namespace for Traceable:ActionScript
kubectl create namespace traceableai
- tfvars file - Create a
terraform.tfvars
file. A sample file is shown below.ActionScripttoken = "" endpoint = "api.traceable.ai" environment = "" run_as_daemon_set = false daemon_set_mirroring_enabled = true ebpf_capture_enabled = true ebpf_run_as_privileged = true
- (optional) Configure tolerations - You can configure tolerations for eBPF pods by providing the variable
ebpf_tolerations
interraform.tfvars
file above. For more information, see Taints and Tolerations.ActionScript
Explanation:ebpf_tolerations = [ { key = "your-app", operator = "Exists", value = null, effect = "NoSchedule", toleration_seconds = null, } ]
- ebpf_tolerations: This is a list containing the tolerations for eBPF pods.
- key: The key represents the taint on the node that the toleration is targeting.
- operator: The operator specifies how the toleration should be evaluated. In this example, “Exists” means that as long as the taint key exists on the node, the toleration will be valid.
- value: The value is associated with the taint key, but in this case, it is set to null. This means that any value associated with the taint key is accepted.
- effect: The effect determines which type of taint effect the toleration matches. In this case, the effect is “NoSchedule,” which means that the toleration allows the pod to be scheduled on nodes with the specified taint key.
- toleration_seconds: This field is also set to null. It is used to specify a time duration for which the toleration is valid. In this case, since it's null, there is no specific time limit defined for the toleration.
- Apply - Enter the following command to apply Terraform:ActionScript
terraform init terraform apply
- Verification- Enter the following command to verify a successful installation.ActionScript
Following is an example output of a successful installation:kubectl get pods -n traceableai
ActionScriptNAME READY STATUS RESTARTS AGE traceable-agent-6b87685fb4-ghb58 1/1 Running 0 17m traceable-ebpf-tracer-ds-2kx2l 1/1 Running 0 55s
You can also verify a successful installation by navigating to API Catalog → Services and check for ebpf in the traceable.module.name
field as shown in the screenshot below.
Enable or disable mirroring
To configure mirroring, go through the following points:
Enable mirroring for all namespaces
Mirroring is disabled by default. To enable mirroring for all namespaces, use the following configuration:
- If you are using Helm, then in
values.yaml
, set -daemonSetMirrorAllNamespaces: true
- If you are using Terraform, then in
main.tf
, set -daemon_set_mirror_all_namespaces
=true
Enable mirroring for a namespace
To enable mirroring for a namespace, set the namespace label traceableai-mirror
to enabled
or enter the following command:
kubectl label ns <namespace> traceableai-mirror=enabled
Disable mirroring for a namespace
To disable mirroring for a namespace, set the namespace label traceableai-mirror
to disabled
or enter the following command:
kubectl label ns <namespace> traceableai-mirror=disabled
Disable mirroring for a pod
To disable mirroring for a pod, set the pod annotation mirror.traceable.ai/enabled
to false
.
kubectl patch deployment <deployment> -n <namespace> -p '{"spec": {"template":{"metadata":{"annotations":{"mirror.traceable.ai/enabled":"false"}}}} }'
Set the mirroring mode
By default, only ingress traffic is captured. However, you can capture only egress traffic, or both ingress and egress traffic, by configuring correct annotations.
Capture egress traffic
To capture the egress traffic for a deployment or namespace, set the following annotations.
Deployment
To capture the egress traffic, set the deployment annotation mirror.traceable.ai/mode
to egress
. Enter the following command:
kubectl patch deployment <deployment> -n <namespace> -p '{"spec": {"template":{"metadata":{"annotations":{"mirror.traceable.ai/mode":"egress"}}}} }'
Namespace
To capture egress traffic at the namespace level, set the annotation mirror.traceable.ai/defaultMode
to egress
. Enter the following:
kubectl annotate namespace <NAMESPACE> mirror.traceable.ai/defaultMode=egress
Capture ingress and egress traffic
To capture both ingress and egress traffic for a deployment or namespace, set the following annotations.
Deployment
To capture ingress and egress traffic for a deployment, set the deployment annotation mirror.traceable.ai/mode
toingress_and_egress
. Enter the following command:
kubectl patch deployment <deployment> -n <namespace> -p '{"spec": {"template":{"metadata":{"annotations":{"mirror.traceable.ai/mode":"ingress_and_egress"}}}} }'
Namespace
To capture the ingress and egress traffic at a namespace level, set the annotation mirror.traceable.ai/defaultMode
to ingress_and_egress
. Enter the following command:
kubectl annotate namespace <NAMESPACE> mirror.traceable.ai/defaultMode=ingress_and_egress
Logs
eBPF package allows you to customize various configurations for eBPF logs. These configurations are part of ebpf-tracer-config.yaml
file. Following is a snippet of the logging configurations:
logging:
level: info
encoding: "json"
output_paths:
- stdout
error_output_paths:
- stderr
log_rotation:
enabled: false
filename: "tracer.log"
filepath: "/var/traceable/log/ebpf-tracer"
max_size: 10 # megs
max_backups: 10
The following table explains the configurations:
Parameter | Description |
---|---|
level | Defines the log's logging level. The values can be trace , debug , info , warn , or error . The default value is info . |
encoding | Encoding sets the logger's encoding. Valid values are json or console . |
output_paths | The path to which output is sent. The default value is stdout . |
error_output_paths | The stream to which error logs are printed. The default value is stderr . |
Log rotation
The following table specifically explains the configurations for log rotation from the above snippet:
Parameter | Description |
---|---|
enabled | Defines whether you wish to enable or disable log rotation. The default value is false . |
filename | The name of the file to print the logs. The default value is tracer.log . |
filepath | The path to save the log file. |
max_size | The file size in megabytes, after which the log file is rotated. |
max_backups | The number of previous backups that are retained. |
Upgrade
You can upgrade the Traceable agent in Kubernetes using the following Helm commands:
- Update helm charts by entering the following command:ActionScriptActionScript
helm repo update traceableai
- Enter the following command to upgrade the Traceable agent to the latest version:ActionScriptActionScript
helm upgrade traceable-agent --namespace traceableai traceableai/traceable-agent
Uninstall
Enter the following command to uninstall the Platform agent using Helm:
ActionScript
helm uninstall traceable-agent --namespace traceableai
Troubleshooting
Troubleshooting for eBPF starts with collecting container logs. Enter the following command to collect the logs:
kubectl logs -n traceableai traceable-agent-bzxts traceable-ebpf-tracer
Following are a few of the steps that you can take to troubleshoot eBPF issues:
Verify correct configuration
The current configuration is part of the logs. As soon the configuration is parsed, it is available in the logs. A sample log entry of configuration is shown below:
time="2022-07-18T09:39:56Z" level=info msg="config log_level:{value:\"info\"} proc_fs_path:{value:\"/hostproc\"}
unix_domain_socket_addr:{value:\"/var/log/sock/eve.json\"}perfmap_queue_size:{value:1024} uds_event_queue_size:
{value:10000} probe_event_queue_size:{value:50000}capture_all_namespaces:{} k8s_enabled:{value:true}
mode:{value:\"all\"} max_active_ret_probe:{value:1}"
Error related to BTF (BPF Type Format) not found
You are not likely to encounter this error in the latest Linux kernels, as most of them have vmlinux
file in the /boot
directory. The /boot
directory contains the debug information that is required to run the eBPF program. The other Linux kernels which do not have vmlinux
file, Traceable ships BTF files in the eBPF container. These BTF files are available on the Traceable's download site.
The ebpf-tracer
first checks for vmlinux
file. It then checks for BTF file locally based on the OS information. If the ebpf-tracer does not find the file, it then downloads it from the download site. If all these steps fail, reach out to Traceable support with the OS details from the log files. The OS information is available in the logs when vmlinux
is not found.
time="2022-07-18T09:39:57Z" level=info msg=system info {"sysinfo":{"version":"0.9.5","timestamp":"2022-07-18T10:22:37.753642303Z"},
"node":{"hostname":"sant","machineid":"4b4d738cd6864265b10089357502600c",
"hypervisor":"vmware","timezone":"Etc/UTC"},"os":{"name":"Ubuntu 18.04.6 LTS",
"vendor":"ubuntu","version":"18.04","release":"18.04.6","architecture":"amd64"},
"kernel":{"release":"4.19.0-041900-generic","version":"#201810221809 SMP Mon Oct 22 22:11:45 UTC 2018",
"architecture":"x86_64"},"product":{"name":"VMware Virtual Platform",
"vendor":"VMware, Inc.","version":"None","serial":"VMware-56 4d c7 1f 27 9f 91 58-0c af 0a e1 90 79 28 bb"},
"board":{"name":"440BX Desktop Reference Platform","vendor":"Intel Corporation","version":"None","serial":"None"},
"chassis":{"type":1,"vendor":"No Enclosure","version":"N/A","serial":"None",
"assettag":"No Asset Tag"},"bios":{"vendor":"Phoenix Technologies LTD","version":"6.00","date":"11/12/2020"},
"cpu":{"vendor":"GenuineIntel","model":"Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz","speed":2600,"cache":12288,"threads":2},"memory":{"type":"DRAM","size":4096},
"storage":[{"name":"sda","driver":"sd","vendor":"VMware,","model":"VMware Virtual S","size":21}],
"network":[{"name":"ens33","driver":"e1000","macaddress":"00:0c:29:79:28:bb","port":"tp","speed":1000}]}
Check if pods are being tracked
For each pod, the logs contain the following information:
time="2022-07-18T09:39:57Z" level=info
msg="Added pod to maps. {\"Name\":\"linkerd-proxy-injector-6848fbbc4-hs4wj\",
\"Namespace\":\"linkerd\",\"Service\":\"linkerd-proxy-injector.linkerd\",\"Enabled\":false,\"Mode\":0}"
- Name : name of pod
- Namespace
- Service: service to which it belongs
- Enabled:
true
ifebpf-tracer
is tracking this pod - Mode: 0 for ingress and 1 for egress
Check Statistics
You can check statistics if the requests are being parsed or requests are getting drooped.
time="2022-07-18T09:41:57Z" level=info
msg="stats {\"ControlEventReceived\":0,\"DataEventReceived\":0,\
"TotalRequestsParsed\":0,\
"ReqParsingErrors\":0,\
"ResParsingErrors\":0,\
"EventLost\":0,\
"TotalRequestsSent\":0,\
"TotalEventsDroppedAtEventQueueLimit\":0,\
"TotalEventsDroppedAtParsing\":0,\
"KprobeEventMaxQueueSizeTillNow\":1}"
- ControlEventReceived: kprobe received from kernel for accept, connect and close calls. KProbes is a debugging mechanism for the Linux kernel which can also be used for monitoring events inside a production system. You can use it to find out performance bottlenecks, log specific events, tracing problems, and so on.
- DataEventReceived: [k/u]probe (kprobes and uprobes) received from kernel with data (HTTP).
- TotalRequestsParsed: Total requests parsed successfully.
- ReqParsingErrors: Errors occurred during parsing of requests.
- ResParsingErrors: Errors occurred during parsing of responses.
- EventLost: Events lost during read from perf buffers. This happens when event consumption is slower than the event produced in the kernel.
- TotalRequestsSent: Total Spans queued for sending to the Traceable Platform Agent.
- TotalEventsDroppedAtEventQueueLimit:
ebpf-tracer
maintains a queue of events for parsing, if this count is increasing means thatebpf-tracer
needs more CPUs. - TotalEventsDroppedAtParsing: Number of events dropped during parsing of data. This can occur due to out of order events.
Check Probe Statistics
time="2022-07-18T09:42:57Z" level=info msg="probe stat {\"read\":[224247,224209],\"recvfrom\":[2387,2384],\"recvmmsg\":[26,26],
\"recvmsg\":[1106,1103],\"sendmsg\":[570,560],\"sendto\":[80,80],\"write\":[21898,21850],\"writev\":[682,682]}"
The above shown statistics list the entry and exit probe executed for a function. In some environments, you may see that Linux sometimes chooses not to execute the return probe. A possible reason could be a configuration on Linux to execute a number of parallel return probes. These parallel numbers of return probes is equal to the number of default CPUs. This is sometimes not sufficient and causes drops in return probes. If you see a large difference between the two counts of each call, then set the max_active_ret_probe
in config to a higher value (10 times the number of CPUs). This setting is also available in the helm charts and terraform deployment of Traceable's Platform Agent.