eBPF
  • 18 Nov 2022
  • 6 Minutes to read

eBPF


Linux divides its memory space into two areas, kernel space and user space. Kernel space is where the core of the operating system resides. The kernel space has unrestricted access to all the hardware, that is, memory, CPU, storage, and so on. The user space is the space where user applications are run. The user space code has limited direct access to hardware and relies on kernel space to complete its operation. eBPF (extended Berkley Packet Filter) is a mechanism for writing code that can be executed in the Linux kernel space. eBPF allows you to package the user space application logic to be executed in the Linux kernel space as bytecode. 

Traceable provides an eBPF solution that attaches probes to kernel functions and collects the data. The probes are attached to the functions which get executed during any network socket transaction, like pen, connect, read, write, and close calls. Based on parameters, Traceable decides whether to collect data or not.

Note:
Request blocking is not supported with eBPF Traceable agent deployment.

The following diagram shows a high-level flow of how the Traceable's eBPF collection based solution works:



Before you begin

Make sure that the following prerequisites are met to install eBPF based Traceable agent. 

  • Linux kernel - The following kernel versions are supported with BTF (BTP Type Format) enabled:
    • RHEL 7 and CentOS 7- The underlying Linux kernel should be 3.10.0-1160.76 or later.
    • Ubuntu, Debian, and RHEL 8 - The underlying Linux kernel should be 4.18 or later.
  • Kernel build - Linux kernel built with CONFIG_DEBUG_INFO_BTF=y option. To check whether Kernel is built with CONFIG_DEBUG_INFO_BTF=y optionenter the following command and look for CONFIG_DEBUG_INFO_BTF=y option:
    cat /boot/config-$(uname -r) | grep BTF
  • Capabilities - SYS_PTRACE and SYS_ADMINcapabilities in Kubernetes. You can check this in Traceable's helm template. A snippet is shown below:
    ActionScript
     capabilities:
                add:
                - SYS_PTRACE
                - SYS_ADMIN
  • Traceable agent - Traceable agent 1.19.2 or later.
  • Traceable access token - In Traceable platform (UI), navigate to Administration (image-1638268402925) > Access Token and click on Generate Agent Token. Copy the token.
  • Privileged user - The deployment requires privileged user access. Check step 2 of the Installation section for ebpfRunAsPrivileged:true.
Note
Since eBPF solution works at the kernel level interception of traffic, therefore, no specific ports need to be opened for installing Traceable's agent.

Installation

Complete the following steps to install Traceable agent for eBPF:

  1. Create a traceableai namespace.
    ActionScript
    kubectl create namespace traceableai
  2. Define a sample values.yml file to install the agent. for example:
    YAML
    token: <>
    environment: <>
    runAsDaemonSet: false
    daemonSetMirroringEnabled: true
    ebpfCaptureEnabled: true
    ebpfRunAsPrivileged: true
    Note
    If you are upgrading your Traceable agent from a version earlier than 1.25.0, make sure that runAsDaemonSet is set to false.
    Paste the access token that you copied from the Traceable platform in the token field.
  3. Run the following command to install Traceable agent in daemonset mode:
    YAML
    helm repo add traceableai https://helm.traceable.ai
    helm repo update
    helm install --namespace traceableai traceable-agent traceableai/traceable-agent --values values.yaml
  4. Verify that Traceable agent pods are created. Enter the following command:
    YAML
    % kubectl get pods -n traceableai
    NAME                    READY   STATUS    RESTARTS   AGE
    traceable-agent-49nh9   2/2     Running   0          49s
    The output of the get pods command would differ based on your deployment environment.

You can also verify a successful installation by navigating to API Catalog > Services and check for ebpf in the traceable.module.name field as shown in the screenshot below.


Enable or disable mirroring

To configure mirroring, go through the following points:

Enable mirroring for all namespaces

Mirroring is disabled by default. To enable mirroring for all namespaces, use the following configuration:

  • If you are using Helm, then in values.yaml, set - daemonSetMirrorAllNamespaces: true 
  • If you are using Terraform, then in main.tf, set - daemon_set_mirror_all_namespaces = true

Enable mirroring for a namespace

To enable mirroring for a namespace, set the namespace label traceableai-mirror to enabled or enter the following command:

kubectl label ns <namespace> traceableai-mirror=enabled

Disable mirroring for a namespace

To disable mirroring for a namespace, set the namespace label traceableai-mirror to disabled or enter the following command:

kubectl label ns <namespace> traceableai-mirror=disabled

Disable mirroring for a pod

To disable mirroring for a pod, set the pod annotation mirror.traceable.ai/enabled to false.

kubectl patch deployment <deployment> -n <namespace> -p '{"spec": {"template":{"metadata":{"annotations":{"mirror.traceable.ai/enabled":"false"}}}} }'

Set the mirroring mode

The default mirroring pod is ingress. If you want to capture egress traffic, enter the following:

For a pod

To capture the egress traffic Set the mirror.traceable.ai/mode to egress.

kubectl patch deployment <deployment> -n <namespace> -p '{"spec": {"template":{"metadata":{"annotations":{"mirror.traceable.ai/mode":"egress"}}}} }'

For a namespace

To capture egress traffic at the namespace level, set the annotation mirror.traceable.ai/defaultMode to egress.

kubectl annotate namespace <NAMESPACE> mirror.traceable.ai/defaultMode=egress

Upgrade

You can upgrade the Traceable agent in Kubernetes using the following Helm commands:

  1. Update helm charts by entering the following command:
    ActionScript
    ActionScript
    helm repo update traceableai
  2. Enter the following command to upgrade the Traceable agent to the latest version:
    ActionScript
    ActionScript
    helm upgrade traceable-agent --namespace traceableai traceableai/traceable-agent

Uninstall

Enter the following command to uninstall the Platform agent using Helm:

ActionScript
helm uninstall traceable-agent --namespace traceableai

Troubleshooting

Troubleshooting for eBPF starts with collecting container logs. Enter the following command to collect the logs:

kubectl logs -n traceableai traceable-agent-bzxts traceable-ebpf-tracer

Following are a few of the steps that you can take to troubleshoot eBPF issues:

Verify correct configuration

The current configuration is part of the logs. As soon the configuration is parsed, it is available in the logs. A sample log entry of configuration is shown below:

time="2022-07-18T09:39:56Z" level=info msg="config log_level:{value:\"info\"} proc_fs_path:{value:\"/hostproc\"}
unix_domain_socket_addr:{value:\"/var/log/sock/eve.json\"}perfmap_queue_size:{value:1024} uds_event_queue_size:
{value:10000} probe_event_queue_size:{value:50000}capture_all_namespaces:{} k8s_enabled:{value:true}
mode:{value:\"all\"} max_active_ret_probe:{value:1}"

Error related to BTF (BPF Type Format) not found

You are not likely to encounter this error in the latest Linux kernels as most of them have vmlinux file in the /boot directory. The /boot directory contains the debug information that is required to run the eBPF program. The other Linux kernels which do not have vmlinux file, Traceable ships BTF files in the eBPF container. These BTF files are available on the Traceable's download site.

The ebpf-tracer first checks for vmlinux file. It then checks for BTF file locally based on the OS information. If the ebpf-tracer does not find the file, it then downloads it from the download site. If all these steps fail, reach out to Traceable support with the OS details from the log files. The OS informaton is available in the logs, when vmlinux is not found.

time="2022-07-18T09:39:57Z" level=info msg=system info {"sysinfo":{"version":"0.9.5","timestamp":"2022-07-18T10:22:37.753642303Z"},
"node":{"hostname":"sant","machineid":"4b4d738cd6864265b10089357502600c",
"hypervisor":"vmware","timezone":"Etc/UTC"},"os":{"name":"Ubuntu 18.04.6 LTS",
"vendor":"ubuntu","version":"18.04","release":"18.04.6","architecture":"amd64"},
"kernel":{"release":"4.19.0-041900-generic","version":"#201810221809 SMP Mon Oct 22 22:11:45 UTC 2018",
"architecture":"x86_64"},"product":{"name":"VMware Virtual Platform",
"vendor":"VMware, Inc.","version":"None","serial":"VMware-56 4d c7 1f 27 9f 91 58-0c af 0a e1 90 79 28 bb"},
"board":{"name":"440BX Desktop Reference Platform","vendor":"Intel Corporation","version":"None","serial":"None"},
"chassis":{"type":1,"vendor":"No Enclosure","version":"N/A","serial":"None",
"assettag":"No Asset Tag"},"bios":{"vendor":"Phoenix Technologies LTD","version":"6.00","date":"11/12/2020"},
"cpu":{"vendor":"GenuineIntel","model":"Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz","speed":2600,"cache":12288,"threads":2},"memory":{"type":"DRAM","size":4096},
"storage":[{"name":"sda","driver":"sd","vendor":"VMware,","model":"VMware Virtual S","size":21}],
"network":[{"name":"ens33","driver":"e1000","macaddress":"00:0c:29:79:28:bb","port":"tp","speed":1000}]}

Check if pods are being tracked

For each pod, the logs contain the following information:

time="2022-07-18T09:39:57Z" level=info 
msg="Added pod to maps. {\"Name\":\"linkerd-proxy-injector-6848fbbc4-hs4wj\",
\"Namespace\":\"linkerd\",\"Service\":\"linkerd-proxy-injector.linkerd\",\"Enabled\":false,\"Mode\":0}"
  • Name : name of pod 
  • Namespace
  • Service: service to which it belongs
  • Enabled: true if ebpf-tracer is tracking this pod
  • Mode: 0 for ingress and 1 for egress

Check Statistics

You can check statistics if the requests are being parsed or requests are getting drooped. 

time="2022-07-18T09:41:57Z" level=info
msg="stats {\"ControlEventReceived\":0,\"DataEventReceived\":0,\
"TotalRequestsParsed\":0,\
"ReqParsingErrors\":0,\
"ResParsingErrors\":0,\
"EventLost\":0,\
"TotalRequestsSent\":0,\
"TotalEventsDroppedAtEventQueueLimit\":0,\
"TotalEventsDroppedAtParsing\":0,\
"KprobeEventMaxQueueSizeTillNow\":1}"
  • ControlEventReceived: kprobe received from kernel for accept, connect and close calls. KProbes is a debugging mechanism for the Linux kernel which can also be used for monitoring events inside a production system. You can use it to find out performance bottlenecks, log specific events, tracing problems, and so on.
  • DataEventReceived: [k/u]probe (kprobes and uprobes) received from kernel with data (HTTP).
  • TotalRequestsParsed: Total requests parsed successfully.
  • ReqParsingErrors: Errors occurred during parsing of requests.
  • ResParsingErrors: Errors occurred during parsing of responses.
  • EventLost: Events lost during read from perf buffers. This happens when event consumption is slower than the event produced in the kernel.
  • TotalRequestsSent: Total requests sent to Traceable's Platform Agent.
  • TotalEventsDroppedAtEventQueueLimit: ebpf-tracer maintains a queue of event for parsing, if this count is increasing means that ebpf-tracer needs more CPUs.
  • TotalEventsDroppedAtParsing: Number of events dropped during parsing of data. This  can occur due to out of order events.

Check Probe Statistics

time="2022-07-18T09:42:57Z" level=info msg="probe stat {\"read\":[224247,224209],\"recvfrom\":[2387,2384],\"recvmmsg\":[26,26],
\"recvmsg\":[1106,1103],\"sendmsg\":[570,560],\"sendto\":[80,80],\"write\":[21898,21850],\"writev\":[682,682]}"

The above shown statistics list the entry and exit probe executed for a function. In some of the environments you may see that Linux sometimes chooses to not execute the return probe. A possible reason could be a configuration in Linux to execute number of parallel return probes. These parallel number of return probes is equal to the number of default CPUs. This sometimes is not sufficient and causes drops in return probes. If you see a large difference between the two counts of each calls, then set the max_active_ret_probe in config to higher value (10 times the number of CPUs). This setting is also available in the helm charts and terraform deployment of Traceable's Platform Agent.


Was this article helpful?