小孩子的BPF,第一部分:扩展BPF

最初有一种称为BPF的技术。我们在本周期上一期旧约中对其进行了研究。在2013年,由于Alexei Starovoitov和Daniel Borkman的努力,开发了针对现代64位计算机进行了优化的改进版本,并将其包含在Linux内核中。这项新技术简称为Internal BPF,然后将其重命名为Extended BPF,现在,几年后,每个人都将其简称为BPF。



粗略地讲,BPF允许用户提供的任意代码在Linux内核空间中运行,新的体系结构是如此成功,以至于我们需要更多的文章来描述其所有用途。正如您在下面的CPDV中可以看到的那样,开发人员唯一无法处理的事情就是创建一个不错的徽标。



本文介绍了BPF虚拟机的结构,与BPF配合使用的内核接口,开发工具以及现有功能的简短概述。将来我们需要的所有内容,以便对BPF的实际应用进行更深入的研究。



文章摘要



BPF体系结构简介。首先,我们将对BPF架构进行鸟瞰,并概述主要组件。



BPF. , BPF.



BPF, bpffs. BPF — .



bpf. , , , — bpf(2).



BPF libbpf. , , . . libbpf. BPF, .



Kernel Helpers. BPF - — , , , BPF .



maps BPF. , , . verifier.



. , .



. , , , . , .



BPF



BPF ( ) BPF, RISC . , , Berkeley UNIX, , .



BPF 64- , SDN (Software-defined networking). BPF, BPF Linux , , , , , , .





BPF — -, «» . BPF , - . , , , - , .. BPF ( , , , ), — , , ..



. BPF, . , , , , C. llvm, - BPF.







BPF , , , . , , - BPF, JIT compiler (Just In Time). , , BPF — . — bpf(2), , , , , (attaches) .



: , ? ? BPF (- verifier ):







Verifier — , , . , , , — BPF, , , , , , . Verifier , BPF , , , , . verifier , , BPF.



, ? C, bpf(2), verifier . . . -, verifier — , , . -, , , «» , . ( , , libbpf.)



, . , , BPF BPF. , — - (kernel helpers). BPF maps — API. , , , map -. , (per-CPU) - , , , BPF . , BPF .



maps bpf(2), BPF, — -. , helpers , . , BPF - , perf, ..







, BPF , .., verifier, . , .



, , BPF (, , , — BPF ). ( , BPF , ), ( , ), — , BPF , ( , BPF).



BPF , : BPF, BPF 24x7, , BPF. BPF : DDoS , SDN (, kubernetes), , - ..



BPF .



:



, , , llvm/clang bpf bpftool. , . , .



BPF



BPF , C . , , . , , , , 4096 ( ).



BPF 64- r0r10 (program counter). r10 (frame pointer) . 512 maps.



BPF - (kernel helpers) , , . , r1r5, r0. , r6r9 .



r0r11 ABI . , x86_64 r1r5, , rdi, rsi, rdx, rcx, r8, x86_64. , :



1:  (b7) r1 = 1                    mov    $0x1,%rdi
2:  (b7) r2 = 2                    mov    $0x2,%rsi
3:  (b7) r3 = 3                    mov    $0x3,%rdx
4:  (b7) r4 = 4                    mov    $0x4,%rcx
5:  (b7) r5 = 5                    mov    $0x5,%r8
6:  (85) call pc+1                 callq  0x0000000000001ee8


r0 , r1 — , , struct xdp_md ( XDP) struct __sk_buff ( ) struct pt_regs ( tracing ) ..



, , kernel helpers, , maps. , , ...



. ( ) BPF 64- . 64- Big Endian ,







Code — , Dst/Src —  , , Off — 16- , Imm — 32- , ( K cBPF). Code :







0, 1, 2, 3 . , BPF_LD, BPF_LDX, BPF_ST, BPF_STX, . 4, 7 (BPF_ALU, BPF_ALU64) ALU . 5, 6 (BPF_JMP, BPF_JMP32) .



BPF : , , , BPF. Verifier, JIT , BPF, maps, ..



, bpf.h bpf_common.h, BPF. / , , , : Unofficial eBPF spec, BPF and XDP Reference Guide, Instruction Set, Documentation/networking/filter.txt , , Linux — verifier, JIT, BPF.



: BPF



, readelf-example.c . readelf-example.c , :



$ clang -target bpf -c readelf-example.c -o readelf-example.o -O2
$ llvm-readelf -x .text readelf-example.o
Hex dump of section '.text':
0x00000000 b7000000 01000000 15010100 00000000 ................
0x00000010 b7000000 02000000 95000000 00000000 ................


readelf — , , :



Code Dst Src Off  Imm
b7   0   0   0000 01000000
15   0   1   0100 00000000
b7   0   0   0000 02000000
95   0   0   0000 00000000


b7, 15, b7 95. , — . , , , 7, 5, 7, 5. 7 — BPF_ALU64, 5 — BPF_JMP. (. ) ( ):



Op S  Class   Dst Src Off  Imm
b  0  ALU64   0   0   0    1
1  0  JMP     0   1   1    0
b  0  ALU64   0   0   0    2
9  0  JMP     0   0   0    0


b ALU64BPF_MOV. -. s (source), -, , , , Imm. , r0 = Imm. , 1 JMP — BPF_JEQ (jump if equal). , S , - Imm. , PC + Off, PC, , . , 9 JMP — BPF_EXIT. , r0. :



Op    S  Class   Dst Src Off  Imm    Disassm
MOV   0  ALU64   0   0   0    1      r0 = 1
JEQ   0  JMP     0   1   1    0      if (r1 == 0) goto pc+1
MOV   0  ALU64   0   0   0    2      r0 = 2
EXIT  0  JMP     0   0   0    0      exit


:



     r0 = 1
     if (r1 == 0) goto END
     r0 = 2
END:
     exit


, r1 , r0 , , , 1, — 2. , , :



$ cat readelf-example.c
int foo(void *ctx)
{
        return ctx ? 2 : 1;
}


, , .



-: 16-



, , 64 . , , lddw (Code = 0x18 = BPF_LD | BPF_DW | BPF_IMM) — Imm. , Imm 32, — 64 , 64- 64- . 64- Imm. :



$ cat x64.c
long foo(void *ctx)
{
        return 0x11223344aabbccdd;
}
$ clang -target bpf -c x64.c -o x64.o -O2
$ llvm-readelf -x .text x64.o
Hex dump of section '.text':
0x00000000 18000000 ddccbbaa 00000000 44332211 ............D3".
0x00000010 95000000 00000000                   ........


:



Binary                                 Disassm
18000000 ddccbbaa 00000000 44332211    r0 = Imm[0]|Imm[1]
95000000 00000000                      exit


lddw, maps.



: BPF



, BPF , . , , , :



$ llvm-objdump -d x64.o

Disassembly of section .text:

0000000000000000 <foo>:
 0: 18 00 00 00 dd cc bb aa 00 00 00 00 44 33 22 11 r0 = 1234605617868164317 ll
 2: 95 00 00 00 00 00 00 00 exit


BPF, bpffs



( , , Alexei Starovoitov BPF Blog.)



BPF — — BPF_PROG_LOAD BPF_MAP_CREATE bpf(2), . refcount ( ) , , . refcount , .



, refcount , .. refcount :







- . , - tracepoint . -.



? (hook). , , , . , , , ( , «local to the process»). , , — . - .







? userspace, , DDoS — BPF , . , , —  , , .



, , tracepoint . . bpf. - , , , BPF , , refcount . , .







bpffs, BPF «» («pin», : «process can pin a BPF program or map»). BPF , — DDoS, .



BPF /sys/fs/bpf, , , :



$ mkdir bpf-mountpoint
$ sudo mount -t bpf none bpf-mountpoint


BPF_OBJ_PIN BPF. - , , bpffs. , , :



$ cat test.c
__attribute__((section("xdp"), used))
int test(void *ctx)
{
        return 0;
}

char _license[] __attribute__((section("license"), used)) = "GPL";


bpffs:



$ clang -target bpf -c test.c -o test.o
$ mkdir bpf-mountpoint
$ sudo mount -t bpf none bpf-mountpoint


bpftool bpf(2) ( strace ):



$ sudo strace -e bpf bpftool prog load ./test.o bpf-mountpoint/test
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_XDP, prog_name="test", ...}, 120) = 3
bpf(BPF_OBJ_PIN, {pathname="bpf-mountpoint/test", bpf_fd=3}, 120) = 0


BPF_PROG_LOAD, 3 BPF_OBJ_PIN "bpf-mountpoint/test". - bpftool , , :



$ sudo bpftool prog | tail -3
783: xdp  name test  tag 5c8ba0cf164cb46c  gpl
        loaded_at 2020-05-05T13:27:08+0000  uid 0
        xlated 24B  jited 41B  memlock 4096B


unlink(2) :



$ sudo rm ./bpf-mountpoint/test
$ sudo bpftool prog show id 783
Error: get by id (783): No such file or directory




, , , ( ), , , .



BPF , .. replace = detach old program, attach new program. , , «» , .





, . . , XDP.



bpf



BPF



BPF bpf, :



#include <linux/bpf.h>

int bpf(int cmd, union bpf_attr *attr, unsigned int size);


cmdenum bpf_cmd, attr —  size — , .. sizeof(*attr). 5.8 bpf 34 , union bpf_attr 200 . , .



BPF_PROG_LOAD, BPF — BPF . verifier, JIT compiler , , . BPF.



, BPF, , —  , verifier. , , : BPF_PROG_TYPE_XDP, XDP_PASS ( ). BPF :



r0 = 2
exit


, , , :



#define _GNU_SOURCE
#include <string.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <linux/bpf.h>

static inline __u64 ptr_to_u64(const void *ptr)
{
        return (__u64) (unsigned long) ptr;
}

int main(void)
{
    struct bpf_insn insns[] = {
        {
            .code = BPF_ALU64 | BPF_MOV | BPF_K,
            .dst_reg = BPF_REG_0,
            .imm = XDP_PASS
        },
        {
            .code = BPF_JMP | BPF_EXIT
        },
    };

    union bpf_attr attr = {
        .prog_type = BPF_PROG_TYPE_XDP,
        .insns     = ptr_to_u64(insns),
        .insn_cnt  = sizeof(insns)/sizeof(insns[0]),
        .license   = ptr_to_u64("GPL"),
    };

    strncpy(attr.prog_name, "woo", sizeof(attr.prog_name));
    syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));

    for ( ;; )
        pause();
}


insns — BPF . BPF bpf_insn. insns r0 = 2, — exit.



. , , tools/include/linux/filter.h



struct bpf_insn insns[] = {
    BPF_MOV64_IMM(BPF_REG_0, XDP_PASS),
    BPF_EXIT_INSN()
};


BPF BPF, .



BPF . attr , , , "woo", , . , , bpf.



, . , bpf, .



, . strace, , :



$ clang -g -O2 simple-prog.c -o simple-prog

$ sudo strace ./simple-prog
execve("./simple-prog", ["./simple-prog"], 0x7ffc7b553480 /* 13 vars */) = 0
...
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_XDP, insn_cnt=2, insns=0x7ffe03c4ed50, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_V
ERSION(0, 0, 0), prog_flags=0, prog_name="woo", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS}, 72) = 3
pause(


, bpf(2) 3 pause(). . bpftool:



# bpftool prog | grep -A3 woo
390: xdp  name woo  tag 3b185187f1855c4c  gpl
        loaded_at 2020-08-31T24:66:44+0000  uid 0
        xlated 16B  jited 40B  memlock 4096B
        pids simple-prog(10381)


, woo ID 390, simple-prog , ( simple-prog , woo ). , woo 16 —  — BPF, (x86_64) —  40 . :



# bpftool prog dump xlated id 390
   0: (b7) r0 = 2
   1: (95) exit


. , JIT :



# bpftool prog dump jited id 390
bpf_prog_3b185187f1855c4c_woo:
   0:   nopl   0x0(%rax,%rax,1)
   5:   push   %rbp
   6:   mov    %rsp,%rbp
   9:   sub    $0x0,%rsp
  10:   push   %rbx
  11:   push   %r13
  13:   push   %r14
  15:   push   %r15
  17:   pushq  $0x0
  19:   mov    $0x2,%eax
  1e:   pop    %rbx
  1f:   pop    %r15
  21:   pop    %r14
  23:   pop    %r13
  25:   pop    %rbx
  26:   leaveq
  27:   retq


- exit(2), , , , JIT , , .



Maps



BPF , BPF, . maps bpf.



, maps . , , , BPF , perf events .. , . , , . <linux/bpf.h>, , - BPF_MAP_TYPE_HASH.



-, , C++, unordered_map<int,long> woo, - « woo , int, — long». , - BPF , , , . BPF_MAP_CREATE bpf. - , map. , BPF, :



$ cat simple-map.c
#define _GNU_SOURCE
#include <string.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <linux/bpf.h>

int main(void)
{
    union bpf_attr attr = {
        .map_type = BPF_MAP_TYPE_HASH,
        .key_size = sizeof(int),
        .value_size = sizeof(int),
        .max_entries = 4,
    };
    strncpy(attr.map_name, "woo", sizeof(attr.map_name));
    syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));

    for ( ;; )
        pause();
}


attr, « - sizeof(int), ». BPF , , , , "woo".



:



$ clang -g -O2 simple-map.c -o simple-map
$ sudo strace ./simple-map
execve("./simple-map", ["./simple-map"], 0x7ffd40a27070 /* 14 vars */) = 0
...
bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=4, value_size=4, max_entries=4, map_name="woo", ...}, 72) = 3
pause(


bpf(2) 3 , , pause(2).



background bpftool ( map ):



$ sudo bpftool map
...
114: hash  name woo  flags 0x0
        key 4B  value 4B  max_entries 4  memlock 4096B
...


114 — ID . ID, map BPF_MAP_GET_FD_BY_ID bpf.



-. :



$ sudo bpftool map dump id 114
Found 0 elements


. hash[1] = 1:



$ sudo bpftool map update id 114 key 1 0 0 0 value 1 0 0 0


:



$ sudo bpftool map dump id 114
key: 01 00 00 00  value: 01 00 00 00
Found 1 element


! . , , bptftool -. ( , BTF, .)



bpftool ? :



$ sudo strace -e bpf bpftool map dump id 114
bpf(BPF_MAP_GET_FD_BY_ID, {map_id=114, next_id=0, open_flags=0}, 120) = 3
bpf(BPF_MAP_GET_NEXT_KEY, {map_fd=3, key=NULL, next_key=0x55856ab65280}, 120) = 0
bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=3, key=0x55856ab65280, value=0x55856ab652a0}, 120) = 0
key: 01 00 00 00  value: 01 00 00 00
bpf(BPF_MAP_GET_NEXT_KEY, {map_fd=3, key=0x55856ab65280, next_key=0x55856ab65280}, 120) = -1 ENOENT


ID BPF_MAP_GET_FD_BY_ID bpf(2) 3. BPF_MAP_GET_NEXT_KEY , NULL «» . BPF_MAP_LOOKUP_ELEM, value. — , , BPF_MAP_GET_NEXT_KEY ENOENT.



, 1, , - hash[1] = 2:



$ sudo strace -e bpf bpftool map update id 114 key 1 0 0 0 value 2 0 0 0
bpf(BPF_MAP_GET_FD_BY_ID, {map_id=114, next_id=0, open_flags=0}, 120) = 3
bpf(BPF_MAP_UPDATE_ELEM, {map_fd=3, key=0x55dcd72be260, value=0x55dcd72be280, flags=BPF_ANY}, 120) = 0


, : BPF_MAP_GET_FD_BY_ID ID, BPF_MAP_UPDATE_ELEM .



, - . , , . , , :



  • BPF_MAP_LOOKUP_ELEM:
  • BPF_MAP_UPDATE_ELEM: /
  • BPF_MAP_DELETE_ELEM:
  • BPF_MAP_GET_NEXT_KEY: ( )
  • BPF_MAP_GET_NEXT_ID: , bpftool map
  • BPF_MAP_GET_FD_BY_ID: ID
  • BPF_MAP_LOOKUP_AND_DELETE_ELEM:
  • BPF_MAP_FREEZE: userspace ( )
  • BPF_MAP_LOOKUP_BATCH, BPF_MAP_LOOKUP_AND_DELETE_BATCH, BPF_MAP_UPDATE_BATCH, BPF_MAP_DELETE_BATCH: . , BPF_MAP_LOOKUP_AND_DELETE_BATCH — 


, maps , -.



, -. , , ? :



$ sudo bpftool map update id 114 key 2 0 0 0 value 1 0 0 0
$ sudo bpftool map update id 114 key 3 0 0 0 value 1 0 0 0
$ sudo bpftool map update id 114 key 4 0 0 0 value 1 0 0 0


:



$ sudo bpftool map dump id 114
key: 01 00 00 00  value: 01 00 00 00
key: 02 00 00 00  value: 01 00 00 00
key: 04 00 00 00  value: 01 00 00 00
key: 03 00 00 00  value: 01 00 00 00
Found 4 elements


:



$ sudo bpftool map update id 114 key 5 0 0 0 value 1 0 0 0
Error: update failed: Argument list too long


, . :



$ sudo strace -e bpf bpftool map update id 114 key 5 0 0 0 value 1 0 0 0
bpf(BPF_MAP_GET_FD_BY_ID, {map_id=114, next_id=0, open_flags=0}, 120) = 3
bpf(BPF_OBJ_GET_INFO_BY_FD, {info={bpf_fd=3, info_len=80, info=0x7ffe6c626da0}}, 120) = 0
bpf(BPF_MAP_UPDATE_ELEM, {map_fd=3, key=0x56049ded5260, value=0x56049ded5280, flags=BPF_ANY}, 120) = -1 E2BIG (Argument list too long)
Error: update failed: Argument list too long
+++ exited with 255 +++


: , BPF_MAP_UPDATE_ELEM , , , E2BIG.



, BPF, . , BPF. , - -, , BPF —  libbpf.



( , : , -, libbpf , . , , .)



BPF libbpf



BPF , . llvm, BPF, libbpf, BPF BPF, llvm/clang.



, , libbpf ( — iproute2, libbcc, libbpf-go, ..) . killer- libbpf BPF CO-RE (Compile Once, Run Everywhere) — , BPF, , API (, ). , CO-RE, BTF ( . , BTF , — :



$ ls -lh /sys/kernel/btf/vmlinux
-r--r--r-- 1 root root 2.6M Jul 29 15:30 /sys/kernel/btf/vmlinux


, , libbpf. CO-RE , —  CONFIG_DEBUG_INFO_BTF.



libbpf tools/lib/bpf bpf@vger.kernel.org. , , https://github.com/libbpf/libbpf - .



, , libbpf, (- ) . , BPF maps, kernel helpers, BTF, ..



, libbpf git submodule, :



$ mkdir /tmp/libbpf-example
$ cd /tmp/libbpf-example/
$ git init-db
Initialized empty Git repository in /tmp/libbpf-example/.git/
$ git submodule add https://github.com/libbpf/libbpf.git
Cloning into '/tmp/libbpf-example/libbpf'...
remote: Enumerating objects: 200, done.
remote: Counting objects: 100% (200/200), done.
remote: Compressing objects: 100% (103/103), done.
remote: Total 3354 (delta 101), reused 118 (delta 79), pack-reused 3154
Receiving objects: 100% (3354/3354), 2.05 MiB | 10.22 MiB/s, done.
Resolving deltas: 100% (2176/2176), done.


libbpf :



$ cd libbpf/src
$ mkdir build
$ OBJDIR=build DESTDIR=root make -s install
$ find root
root
root/usr
root/usr/include
root/usr/include/bpf
root/usr/include/bpf/bpf_tracing.h
root/usr/include/bpf/xsk.h
root/usr/include/bpf/libbpf_common.h
root/usr/include/bpf/bpf_endian.h
root/usr/include/bpf/bpf_helpers.h
root/usr/include/bpf/btf.h
root/usr/include/bpf/bpf_helper_defs.h
root/usr/include/bpf/bpf.h
root/usr/include/bpf/libbpf_util.h
root/usr/include/bpf/libbpf.h
root/usr/include/bpf/bpf_core_read.h
root/usr/lib64
root/usr/lib64/libbpf.so.0.1.0
root/usr/lib64/libbpf.so.0
root/usr/lib64/libbpf.a
root/usr/lib64/libbpf.so
root/usr/lib64/pkgconfig
root/usr/lib64/pkgconfig/libbpf.pc


: BPF BPF_PROG_TYPE_XDP, , , C, clang, -, . BPF, -.



: libbpf



/sys/kernel/btf/vmlinux, , :



$ bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h


, , , IPv4:



$ grep -A 12 'struct iphdr {' vmlinux.h
struct iphdr {
    __u8 ihl: 4;
    __u8 version: 4;
    __u8 tos;
    __be16 tot_len;
    __be16 id;
    __be16 frag_off;
    __u8 ttl;
    __u8 protocol;
    __sum16 check;
    __be32 saddr;
    __be32 daddr;
};


BPF C:



$ cat xdp-simple.bpf.c
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>

SEC("xdp/simple")
int simple(void *ctx)
{
        return XDP_PASS;
}

char LICENSE[] SEC("license") = "GPL";


, . -, , vmlinux.h, bpftool btf dump —  kernel-headers, , . libbpf. , SEC, ELF. xdp/simple, BPF —  , libbpf, bpf(2). BPF C —  return XDP_PASS. , "license" .



llvm/clang, >= 10.0.0, —  (. ):



$ clang --version
clang version 11.0.0 (https://github.com/llvm/llvm-project.git afc287e0abec710398465ee1f86237513f2b5091)
...

$ clang -O2 -g -c -target bpf -I libbpf/src/root/usr/include xdp-simple.bpf.c -o xdp-simple.bpf.o


: -target bpf libbpf, . , -O2, . , , ?



$ llvm-objdump --section=xdp/simple --no-show-raw-insn -D xdp-simple.bpf.o

xdp-simple.bpf.o:       file format elf64-bpf

Disassembly of section xdp/simple:

0000000000000000 <simple>:
       0:       r0 = 2
       1:       exit


, ! , , , . libbpf —  API API. , , BPF .



, «» bpftool —  BPF ( , Daniel Borkman — BPF — ):



$ bpftool gen skeleton xdp-simple.bpf.o > xdp-simple.skel.h


xdp-simple.skel.h — , , . overkill, , BPF ELF - , .



, - — :



#include <err.h>
#include <unistd.h>
#include "xdp-simple.skel.h"

int main(int argc, char **argv)
{
    struct xdp_simple_bpf *obj;

    obj = xdp_simple_bpf__open_and_load();
    if (!obj)
        err(1, "failed to open and/or load BPF object\n");

    pause();

    xdp_simple_bpf__destroy(obj);
}


struct xdp_simple_bpf xdp-simple.skel.h :



struct xdp_simple_bpf {
    struct bpf_object_skeleton *skeleton;
    struct bpf_object *obj;
    struct {
        struct bpf_program *simple;
    } progs;
    struct {
        struct bpf_link *simple;
    } links;
};


API: struct bpf_program *simple struct bpf_link *simple. , xdp/simple, —  , .



xdp_simple_bpf__open_and_load, ELF, , ( ELF — data, readonly data, , ..), bpf, , :



$ clang -O2 -I ./libbpf/src/root/usr/include/ xdp-simple.c -o xdp-simple ./libbpf/src/root/usr/lib64/libbpf.a -lelf -lz

$ sudo strace -e bpf ./xdp-simple
...
bpf(BPF_BTF_LOAD, 0x7ffdb8fd9670, 120)  = 3
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_XDP, insn_cnt=2, insns=0xdfd580, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(5, 8, 0), prog_flags=0, prog_name="simple", prog_ifindex=0, expected_attach_type=0x25 /* BPF_??? */, ...}, 120) = 4


bpftool. ID:



# bpftool p | grep -A4 simple
463: xdp  name simple  tag 3b185187f1855c4c  gpl
        loaded_at 2020-08-01T01:59:49+0000  uid 0
        xlated 16B  jited 40B  memlock 4096B
        btf_id 185
        pids xdp-simple(16498)


( bpftool prog dump xlated):



# bpftool p d x id 463
int simple(void *ctx):
; return XDP_PASS;
   0: (b7) r0 = 2
   1: (95) exit


- ! C. libbpf, , BTF, BPF_BTF_LOAD, BPG_PROG_LOAD.



Kernel Helpers



BPF «» — kernel helpers. - BPF , maps, « » —  perf events, (, ) ..



: bpf_get_smp_processor_id



« », -, bpf_get_smp_processor_id(), kernel/bpf/helpers.c. , BPF. , , :



BPF_CALL_0(bpf_get_smp_processor_id)
{
    return smp_processor_id();
}


- BPF Linux. , , , . (, , , , BPF_CALL_3. .) , . struct bpf_func_proto, -, verifier:



const struct bpf_func_proto bpf_get_smp_processor_id_proto = {
    .func     = bpf_get_smp_processor_id,
    .gpl_only = false,
    .ret_type = RET_INTEGER,
};


-

, BPF , , , BPF_PROG_TYPE_XDP xdp_func_proto, ID - , XDP . :



static const struct bpf_func_proto *
xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
    switch (func_id) {
    ...
    case BPF_FUNC_get_smp_processor_id:
        return &bpf_get_smp_processor_id_proto;
    ...
    }
}


BPF «» include/linux/bpf_types.h BPF_PROG_TYPE. , , C . , kernel/bpf/verifier.c bpf_types.h , bpf_verifier_ops[]:



static const struct bpf_verifier_ops *const bpf_verifier_ops[] = {
#define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type) \
    [_id] = & _name ## _verifier_ops,
#include <linux/bpf_types.h>
#undef BPF_PROG_TYPE
};


, BPF struct bpf_verifier_ops, _name ## _verifier_ops, .., xdp_verifier_ops xdp. xdp_verifier_ops net/core/filter.c :



const struct bpf_verifier_ops xdp_verifier_ops = {
    .get_func_proto     = xdp_func_proto,
    .is_valid_access    = xdp_is_valid_access,
    .convert_ctx_access = xdp_convert_ctx_access,
    .gen_prologue       = bpf_noop_prologue,
};


xdp_func_proto, verifier , - BPF, . verifier.c.



, BPF bpf_get_smp_processor_id. :



#include "vmlinux.h"
#include <bpf/bpf_helpers.h>

SEC("xdp/simple")
int simple(void *ctx)
{
    if (bpf_get_smp_processor_id() != 0)
        return XDP_DROP;
    return XDP_PASS;
}

char LICENSE[] SEC("license") = "GPL";


bpf_get_smp_processor_id <bpf/bpf_helper_defs.h> libbpf



static u32 (*bpf_get_smp_processor_id)(void) = (void *) 8;


, bpf_get_smp_processor_id — , 8, 8 —  BPF_FUNC_get_smp_processor_id enum bpf_fun_id, vmlinux.h ( bpf_helper_defs.h , «» —  ok). __u32. , clang BPF_CALL « ». xdp/simple:



$ clang -O2 -g -c -target bpf -I libbpf/src/root/usr/include xdp-simple.bpf.c -o xdp-simple.bpf.o
$ llvm-objdump -D --section=xdp/simple xdp-simple.bpf.o

xdp-simple.bpf.o:       file format elf64-bpf

Disassembly of section xdp/simple:

0000000000000000 <simple>:
       0:       85 00 00 00 08 00 00 00 call 8
       1:       bf 01 00 00 00 00 00 00 r1 = r0
       2:       67 01 00 00 20 00 00 00 r1 <<= 32
       3:       77 01 00 00 20 00 00 00 r1 >>= 32
       4:       b7 00 00 00 02 00 00 00 r0 = 2
       5:       15 01 01 00 00 00 00 00 if r1 == 0 goto +1 <LBB0_2>
       6:       b7 00 00 00 01 00 00 00 r0 = 1

0000000000000038 <LBB0_2>:
       7:       95 00 00 00 00 00 00 00 exit


call, IMM 8, SRC_REG — . ABI-, verifier, - . . r0 r1 2,3 u32 — 32 . 4,5,6,7 2 (XDP_PASS) 1 (XDP_DROP) , - 0 .



: bpftool prog dump xlated:



$ bpftool gen skeleton xdp-simple.bpf.o > xdp-simple.skel.h
$ clang -O2 -g -I ./libbpf/src/root/usr/include/ -o xdp-simple xdp-simple.c ./libbpf/src/root/usr/lib64/libbpf.a -lelf -lz
$ sudo ./xdp-simple &
[2] 10914

$ sudo bpftool p | grep simple
523: xdp  name simple  tag 44c38a10c657e1b0  gpl
        pids xdp-simple(10915)

$ sudo bpftool p d x id 523
int simple(void *ctx):
; if (bpf_get_smp_processor_id() != 0)
   0: (85) call bpf_get_smp_processor_id#114128
   1: (bf) r1 = r0
   2: (67) r1 <<= 32
   3: (77) r1 >>= 32
   4: (b7) r0 = 2
; }
   5: (15) if r1 == 0x0 goto pc+1
   6: (b7) r0 = 1
   7: (95) exit


, verifier kernel-helper.



: , , !



-



u64 fn(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)


- r1r5, r0. , —  .



kernel helper BPF . xdp-simple.bpf.c ( ):



SEC("xdp/simple")
int simple(void *ctx)
{
    bpf_printk("running on CPU%u\n", bpf_get_smp_processor_id());
    return XDP_PASS;
}


CPU, . :



$ llvm-objdump -D --section=xdp/simple --no-show-raw-insn xdp-simple.bpf.o

0000000000000000 <simple>:
       0:       r1 = 10
       1:       *(u16 *)(r10 - 8) = r1
       2:       r1 = 8441246879787806319 ll
       4:       *(u64 *)(r10 - 16) = r1
       5:       r1 = 2334956330918245746 ll
       7:       *(u64 *)(r10 - 24) = r1
       8:       call 8
       9:       r1 = r10
      10:       r1 += -24
      11:       r2 = 18
      12:       r3 = r0
      13:       call 6
      14:       r0 = 2
      15:       exit


0-7 running on CPU%u\n, 8 bpf_get_smp_processor_id. 9-12 bpf_printkr1, r2, r3. , ? bpf_printk —  - bpf_trace_printk, .



xdp-simple.c, lo - !



$ cat xdp-simple.c
#include <linux/if_link.h>
#include <err.h>
#include <unistd.h>
#include "xdp-simple.skel.h"

int main(int argc, char **argv)
{
    __u32 flags = XDP_FLAGS_SKB_MODE;
    struct xdp_simple_bpf *obj;

    obj = xdp_simple_bpf__open_and_load();
    if (!obj)
        err(1, "failed to open and/or load BPF object\n");

    bpf_set_link_xdp_fd(1, -1, flags);
    bpf_set_link_xdp_fd(1, bpf_program__fd(obj->progs.simple), flags);

cleanup:
    xdp_simple_bpf__destroy(obj);
}


bpf_set_link_xdp_fd, BPF XDP . lo, 1. , , . , pause : - , BPF , . , , lo.



lo:



$ sudo ./xdp-simple
$ sudo bpftool p | grep simple
669: xdp  name simple  tag 4fca62e77ccb43d6  gpl
$ ip l show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    prog/xdp id 669


, ID 669 ID lo. 127.0.0.1 (request + reply):



$ ping -c1 localhost


/sys/kernel/debug/tracing/trace_pipe, bpf_printk :



# cat /sys/kernel/debug/tracing/trace_pipe
ping-13937 [000] d.s1 442015.377014: bpf_trace_printk: running on CPU0
ping-13937 [000] d.s1 442015.377027: bpf_trace_printk: running on CPU0


lo CPU0 —  BPF !



, bpf_printk : production, - .



maps BPF



: BPF



, . , , . xdp-simple.bpf.c :



#include "vmlinux.h"
#include <bpf/bpf_helpers.h>

struct {
    __uint(type, BPF_MAP_TYPE_ARRAY);
    __uint(max_entries, 8);
    __type(key, u32);
    __type(value, u64);
} woo SEC(".maps");

SEC("xdp/simple")
int simple(void *ctx)
{
    u32 key = bpf_get_smp_processor_id();
    u32 *val;

    val = bpf_map_lookup_elem(&woo, &key);
    if (!val)
        return XDP_ABORTED;

    *val += 1;

    return XDP_PASS;
}

char LICENSE[] SEC("license") = "GPL";


woo: 8 , u64 ( C u64 woo[8]). "xdp/simple" key - bpf_map_lookup_element , . : , CPU . :



$ clang -O2 -g -c -target bpf -I libbpf/src/root/usr/include xdp-simple.bpf.c -o xdp-simple.bpf.o
$ bpftool gen skeleton xdp-simple.bpf.o > xdp-simple.skel.h
$ clang -O2 -g -I ./libbpf/src/root/usr/include/ -o xdp-simple xdp-simple.c ./libbpf/src/root/usr/lib64/libbpf.a -lelf -lz
$ sudo ./xdp-simple


, lo :



$ ip l show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    prog/xdp id 108

$ for s in `seq 234`; do sudo ping -f -c 100 127.0.0.1 >/dev/null 2>&1; done


:



$ sudo bpftool map dump name woo
[
    { "key": 0, "value": 0 },
    { "key": 1, "value": 400 },
    { "key": 2, "value": 0 },
    { "key": 3, "value": 0 },
    { "key": 4, "value": 0 },
    { "key": 5, "value": 0 },
    { "key": 6, "value": 0 },
    { "key": 7, "value": 46400 }
]


CPU7. , , BPF — bpf_mp_*.





, BPF



val = bpf_map_lookup_elem(&woo, &key);


-



void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)


&woo struct { ... }...



, , &woo ( 4):



llvm-objdump -D --section xdp/simple xdp-simple.bpf.o

xdp-simple.bpf.o:       file format elf64-bpf

Disassembly of section xdp/simple:

0000000000000000 <simple>:
       0:       85 00 00 00 08 00 00 00 call 8
       1:       63 0a fc ff 00 00 00 00 *(u32 *)(r10 - 4) = r0
       2:       bf a2 00 00 00 00 00 00 r2 = r10
       3:       07 02 00 00 fc ff ff ff r2 += -4
       4:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
       6:       85 00 00 00 01 00 00 00 call 1
...


:



$ llvm-readelf -r xdp-simple.bpf.o | head -4

Relocation section '.relxdp/simple' at offset 0xe18 contains 1 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name
0000000000000020  0000002700000001 R_BPF_64_64            0000000000000000 woo


, map ( 4):



$ sudo bpftool prog dump x name simple
int simple(void *ctx):
   0: (85) call bpf_get_smp_processor_id#114128
   1: (63) *(u32 *)(r10 -4) = r0
   2: (bf) r2 = r10
   3: (07) r2 += -4
   4: (18) r1 = map[id:64]
...


, , - &woo - libbpf. strace:



$ sudo strace -e bpf ./xdp-simple
...
bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=8, max_entries=8, map_name="woo", ...}, 120) = 4
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_XDP, prog_name="simple", ...}, 120) = 5


, libbpf woo simple. , :



  • xdp_simple_bpf__open_and_load xdp-simple.skel.h
  • xdp_simple_bpf__load xdp-simple.skel.h
  • bpf_object__load_skeleton libbpf/src/libbpf.c
  • bpf_object__load_xattr libbpf/src/libbpf.c


, , bpf_object__create_maps, maps, . ( , BPF_MAP_CREATE strace.) bpf_object__relocate , , woo . , , - bpf_program__relocate, :



case RELO_LD64:
    insn[0].src_reg = BPF_PSEUDO_MAP_FD;
    insn[0].imm = obj->maps[relo->map_idx].fd;
    break;


,



18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll


- BPF_PSEUDO_MAP_FD, IMM , , , 0xdeadbeef,



18 11 00 00 ef eb ad de 00 00 00 00 00 00 00 00 r1 = 0 ll


BPF. BPF_MAP_CREATE, ID BPF_MAP_GET_FD_BY_ID.



, libbpf :



  • libbpf ELF,
  • LD64


, , . , —  BPF_PSEUDO_MAP_FD - , — kernel/bpf/verifier.c, struct bpf_map:



static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env) {
    ...

    f = fdget(insn[0].imm);
    map = __bpf_map_get(f);
    if (insn->src_reg == BPF_PSEUDO_MAP_FD) {
        addr = (unsigned long)map;
    }
    insn[0].imm = (u32)addr;
    insn[1].imm = addr >> 32;


( ). :



  • verifier struct bpf_map


ELF libbpf , .



libbpf



, , , , , libbpf. , , , , ply, BPF .



, , xdp-simple. , , gist.



:



  • BPF_MAP_TYPE_ARRAY BPF_MAP_CREATE,
  • , ,
  • lo,




int main(void)
{
    int map_fd, prog_fd;

    map_fd = map_create();
    if (map_fd < 0)
        err(1, "bpf: BPF_MAP_CREATE");

    prog_fd = prog_load(map_fd);
    if (prog_fd < 0)
        err(1, "bpf: BPF_PROG_LOAD");

    xdp_attach(1, prog_fd);
}


map_create , bpf — «, ,   8 __u64 »:



static int map_create()
{
    union bpf_attr attr;

    memset(&attr, 0, sizeof(attr));
    attr.map_type = BPF_MAP_TYPE_ARRAY,
    attr.key_size = sizeof(__u32),
    attr.value_size = sizeof(__u64),
    attr.max_entries = 8,
    strncpy(attr.map_name, "woo", sizeof(attr.map_name));
    return syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
}


:



static int prog_load(int map_fd)
{
    union bpf_attr attr;
    struct bpf_insn insns[] = {
        ...
    };

    memset(&attr, 0, sizeof(attr));
    attr.prog_type = BPF_PROG_TYPE_XDP;
    attr.insns     = ptr_to_u64(insns);
    attr.insn_cnt  = sizeof(insns)/sizeof(insns[0]);
    attr.license   = ptr_to_u64("GPL");
    strncpy(attr.prog_name, "woo", sizeof(attr.prog_name));
    return syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
}


prog_load — BPF struct bpf_insn insns[]. , C, :



$ llvm-objdump -D --section xdp/simple xdp-simple.bpf.o

0000000000000000 <simple>:
       0:       85 00 00 00 08 00 00 00 call 8
       1:       63 0a fc ff 00 00 00 00 *(u32 *)(r10 - 4) = r0
       2:       bf a2 00 00 00 00 00 00 r2 = r10
       3:       07 02 00 00 fc ff ff ff r2 += -4
       4:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
       6:       85 00 00 00 01 00 00 00 call 1
       7:       b7 01 00 00 00 00 00 00 r1 = 0
       8:       15 00 04 00 00 00 00 00 if r0 == 0 goto +4 <LBB0_2>
       9:       61 01 00 00 00 00 00 00 r1 = *(u32 *)(r0 + 0)
      10:       07 01 00 00 01 00 00 00 r1 += 1
      11:       63 10 00 00 00 00 00 00 *(u32 *)(r0 + 0) = r1
      12:       b7 01 00 00 02 00 00 00 r1 = 2

0000000000000068 <LBB0_2>:
      13:       bf 10 00 00 00 00 00 00 r0 = r1
      14:       95 00 00 00 00 00 00 00 exit


, 14 struct bpf_insn (: , , linux/bpf.h linux/bpf_common.h struct bpf_insn insns[] ):



struct bpf_insn insns[] = {
    /* 85 00 00 00 08 00 00 00 call 8 */
    {
        .code = BPF_JMP | BPF_CALL,
        .imm = 8,
    },

    /* 63 0a fc ff 00 00 00 00 *(u32 *)(r10 - 4) = r0 */
    {
        .code = BPF_MEM | BPF_STX,
        .off = -4,
        .src_reg = BPF_REG_0,
        .dst_reg = BPF_REG_10,
    },

    /* bf a2 00 00 00 00 00 00 r2 = r10 */
    {
        .code = BPF_ALU64 | BPF_MOV | BPF_X,
        .src_reg = BPF_REG_10,
        .dst_reg = BPF_REG_2,
    },

    /* 07 02 00 00 fc ff ff ff r2 += -4 */
    {
        .code = BPF_ALU64 | BPF_ADD | BPF_K,
        .dst_reg = BPF_REG_2,
        .imm = -4,
    },

    /* 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll */
    {
        .code = BPF_LD | BPF_DW | BPF_IMM,
        .src_reg = BPF_PSEUDO_MAP_FD,
        .dst_reg = BPF_REG_1,
        .imm = map_fd,
    },
    { }, /* placeholder */

    /* 85 00 00 00 01 00 00 00 call 1 */
    {
        .code = BPF_JMP | BPF_CALL,
        .imm = 1,
    },

    /* b7 01 00 00 00 00 00 00 r1 = 0 */
    {
        .code = BPF_ALU64 | BPF_MOV | BPF_K,
        .dst_reg = BPF_REG_1,
        .imm = 0,
    },

    /* 15 00 04 00 00 00 00 00 if r0 == 0 goto +4 <LBB0_2> */
    {
        .code = BPF_JMP | BPF_JEQ | BPF_K,
        .off = 4,
        .src_reg = BPF_REG_0,
        .imm = 0,
    },

    /* 61 01 00 00 00 00 00 00 r1 = *(u32 *)(r0 + 0) */
    {
        .code = BPF_MEM | BPF_LDX,
        .off = 0,
        .src_reg = BPF_REG_0,
        .dst_reg = BPF_REG_1,
    },

    /* 07 01 00 00 01 00 00 00 r1 += 1 */
    {
        .code = BPF_ALU64 | BPF_ADD | BPF_K,
        .dst_reg = BPF_REG_1,
        .imm = 1,
    },

    /* 63 10 00 00 00 00 00 00 *(u32 *)(r0 + 0) = r1 */
    {
        .code = BPF_MEM | BPF_STX,
        .src_reg = BPF_REG_1,
        .dst_reg = BPF_REG_0,
    },

    /* b7 01 00 00 02 00 00 00 r1 = 2 */
    {
        .code = BPF_ALU64 | BPF_MOV | BPF_K,
        .dst_reg = BPF_REG_1,
        .imm = 2,
    },

    /* <LBB0_2>: bf 10 00 00 00 00 00 00 r0 = r1 */
    {
        .code = BPF_ALU64 | BPF_MOV | BPF_X,
        .src_reg = BPF_REG_1,
        .dst_reg = BPF_REG_0,
    },

    /* 95 00 00 00 00 00 00 00 exit */
    {
        .code = BPF_JMP | BPF_EXIT
    },
};


, —  map_fd.



xdp_attach. , XDP bpf. , BPF XDP Linux, , ( ) : netlink sockets, . RFC3549. xdp_attachlibbpf, , netlink.c, , :



netlink

netlink NETLINK_ROUTE:



int netlink_open(__u32 *nl_pid)
{
    struct sockaddr_nl sa;
    socklen_t addrlen;
    int one = 1, ret;
    int sock;

    memset(&sa, 0, sizeof(sa));
    sa.nl_family = AF_NETLINK;

    sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
    if (sock < 0)
        err(1, "socket");

    if (setsockopt(sock, SOL_NETLINK, NETLINK_EXT_ACK, &one, sizeof(one)) < 0)
        warnx("netlink error reporting not supported");

    if (bind(sock, (struct sockaddr *)&sa, sizeof(sa)) < 0)
        err(1, "bind");

    addrlen = sizeof(sa);
    if (getsockname(sock, (struct sockaddr *)&sa, &addrlen) < 0)
        err(1, "getsockname");

    *nl_pid = sa.nl_pid;
    return sock;
}


:



static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq)
{
    bool multipart = true;
    struct nlmsgerr *errm;
    struct nlmsghdr *nh;
    char buf[4096];
    int len, ret;

    while (multipart) {
        multipart = false;
        len = recv(sock, buf, sizeof(buf), 0);
        if (len < 0)
            err(1, "recv");

        if (len == 0)
            break;

        for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
                nh = NLMSG_NEXT(nh, len)) {
            if (nh->nlmsg_pid != nl_pid)
                errx(1, "wrong pid");
            if (nh->nlmsg_seq != seq)
                errx(1, "INVSEQ");
            if (nh->nlmsg_flags & NLM_F_MULTI)
                multipart = true;
            switch (nh->nlmsg_type) {
                case NLMSG_ERROR:
                    errm = (struct nlmsgerr *)NLMSG_DATA(nh);
                    if (!errm->error)
                        continue;
                    ret = errm->error;
                    // libbpf_nla_dump_errormsg(nh); too many code to copy...
                    goto done;
                case NLMSG_DONE:
                    return 0;
                default:
                    break;
            }
        }
    }
    ret = 0;
done:
    return ret;
}


, , , :



static int xdp_attach(int ifindex, int prog_fd)
{
    int sock, seq = 0, ret;
    struct nlattr *nla, *nla_xdp;
    struct {
        struct nlmsghdr  nh;
        struct ifinfomsg ifinfo;
        char             attrbuf[64];
    } req;
    __u32 nl_pid = 0;

    sock = netlink_open(&nl_pid);
    if (sock < 0)
        return sock;

    memset(&req, 0, sizeof(req));
    req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
    req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
    req.nh.nlmsg_type = RTM_SETLINK;
    req.nh.nlmsg_pid = 0;
    req.nh.nlmsg_seq = ++seq;
    req.ifinfo.ifi_family = AF_UNSPEC;
    req.ifinfo.ifi_index = ifindex;

    /* started nested attribute for XDP */
    nla = (struct nlattr *)(((char *)&req)
            + NLMSG_ALIGN(req.nh.nlmsg_len));
    nla->nla_type = NLA_F_NESTED | IFLA_XDP;
    nla->nla_len = NLA_HDRLEN;

    /* add XDP fd */
    nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
    nla_xdp->nla_type = IFLA_XDP_FD;
    nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
    memcpy((char *)nla_xdp + NLA_HDRLEN, &prog_fd, sizeof(prog_fd));
    nla->nla_len += nla_xdp->nla_len;

    /* if user passed in any flags, add those too */
    __u32 flags = XDP_FLAGS_SKB_MODE;
    nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
    nla_xdp->nla_type = IFLA_XDP_FLAGS;
    nla_xdp->nla_len = NLA_HDRLEN + sizeof(flags);
    memcpy((char *)nla_xdp + NLA_HDRLEN, &flags, sizeof(flags));
    nla->nla_len += nla_xdp->nla_len;

    req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len);

    if (send(sock, &req, req.nh.nlmsg_len, 0) < 0)
        err(1, "send");
    ret = bpf_netlink_recv(sock, nl_pid, seq);

cleanup:
    close(sock);
    return ret;
}


, :



$ cc nolibbpf.c -o nolibbpf
$ sudo strace -e bpf ./nolibbpf
bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, map_name="woo", ...}, 72) = 3
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_XDP, insn_cnt=15, prog_name="woo", ...}, 72) = 4
+++ exited with 0 +++


, lo:



$ ip l show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    prog/xdp id 160


map:



$ for s in `seq 234`; do sudo ping -f -c 100 127.0.0.1 >/dev/null 2>&1; done
$ sudo bpftool m dump name woo
key: 00 00 00 00  value: 90 01 00 00 00 00 00 00
key: 01 00 00 00  value: 00 00 00 00 00 00 00 00
key: 02 00 00 00  value: 00 00 00 00 00 00 00 00
key: 03 00 00 00  value: 00 00 00 00 00 00 00 00
key: 04 00 00 00  value: 00 00 00 00 00 00 00 00
key: 05 00 00 00  value: 00 00 00 00 00 00 00 00
key: 06 00 00 00  value: 40 b5 00 00 00 00 00 00
key: 07 00 00 00  value: 00 00 00 00 00 00 00 00
Found 8 elements


, . , , map . - , , libbpf (BTF). .





BPF.



, BPF — BPF , clang, . , - , BPF , , BPF 2019-,



  • llvm/clang
  • pahole
  • bpftool


( : Debian 10.)



llvm/clang



BPF LLVM , BPF gcc, LLVM. clang git:



$ sudo apt install ninja-build
$ git clone --depth 1 https://github.com/llvm/llvm-project.git
$ mkdir -p llvm-project/llvm/build/install
$ cd llvm-project/llvm/build
$ cmake .. -G "Ninja" -DLLVM_TARGETS_TO_BUILD="BPF;X86" \
                      -DLLVM_ENABLE_PROJECTS="clang" \
                      -DBUILD_SHARED_LIBS=OFF \
                      -DCMAKE_BUILD_TYPE=Release \
                      -DLLVM_BUILD_RUNTIME=OFF
$ time ninja
...   
$


, :



$ ./bin/llc --version
LLVM (http://llvm.org/):
  LLVM version 11.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver1

  Registered Targets:
    bpf    - BPF (host endian)
    bpfeb  - BPF (big endian)
    bpfel  - BPF (little endian)
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64


( clang bpf_devel_QA.)



, PATH, :



export PATH="`pwd`/bin:$PATH"


( .bashrc . ~/bin/activate-llvm.sh . activate-llvm.sh.)



Pahole BTF



pahole BTF. BTF, , . , , pahole ( pahole CONFIG_DEBUG_INFO_BTF:



$ git clone https://git.kernel.org/pub/scm/devel/pahole/pahole.git
$ cd pahole/
$ sudo apt install cmake
$ mkdir build
$ cd build/
$ cmake -D__LIB=lib ..
$ make
$ sudo make install
$ which pahole
/usr/local/bin/pahole


BPF



BPF . , , , BPF , , BPF, , , . .



, , -, , -, . BPF . BPF Linux (David Miller) — Linux. — — — net net-next. BPF bpf bpf-next, net net-next, . . bpf_devel_QA netdev-FAQ. , (*-next ).



, — , , . , , -, , BPF.



:



$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
$ cd bpf-next


:



$ cp /boot/config-`uname -r` .config
$ make localmodconfig


BPF .config ( , CONFIG_BPF , systemd). , :



CONFIG_CGROUP_BPF=y
CONFIG_BPF=y
CONFIG_BPF_LSM=y
CONFIG_BPF_SYSCALL=y
CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_BPF_JIT_DEFAULT_ON=y
CONFIG_IPV6_SEG6_BPF=y
# CONFIG_NETFILTER_XT_MATCH_BPF is not set
# CONFIG_BPFILTER is not set
CONFIG_NET_CLS_BPF=y
CONFIG_NET_ACT_BPF=y
CONFIG_BPF_JIT=y
CONFIG_BPF_STREAM_PARSER=y
CONFIG_LWTUNNEL_BPF=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_BPF_EVENTS=y
CONFIG_BPF_KPROBE_OVERRIDE=y
CONFIG_DEBUG_INFO_BTF=y


(, clang, CC=clang):



$ make -s -j $(getconf _NPROCESSORS_ONLN)
$ sudo make modules_install
$ sudo make install


( kexec kexec-tools):



v=5.8.0-rc6+ #     ,    v=`uname -r`
sudo kexec -l -t bzImage /boot/vmlinuz-$v --initrd=/boot/initrd.img-$v --reuse-cmdline &&
sudo kexec -e


bpftool



bpftool, Linux. BPF BPF BPF — , maps, BPF, .. man pages , , .



bpftool RHEL, Fedora Ubuntu (., , , bpftool Debian). , bpftool :



$ cd ${linux}/tools/bpf/bpftool
# ...     clang,   
$ make -s

Auto-detecting system features:
...                        libbfd: [ on  ]
...        disassembler-four-args: [ on  ]
...                          zlib: [ on  ]
...                        libcap: [ on  ]
...               clang-bpf-co-re: [ on  ]

Auto-detecting system features:
...                        libelf: [ on  ]
...                          zlib: [ on  ]
...                           bpf: [ on  ]

$


( ${linux} — .) bpftool ${linux}/tools/bpf/bpftool ( root) /usr/local/sbin.



bpftool clang, , , , — , ,



$ sudo bpftool feature probe kernel
Scanning system configuration...
bpf() syscall for unprivileged users is enabled
JIT compiler is enabled
JIT compiler hardening is disabled
JIT compiler kallsyms exports are enabled for root
...


, BPF .



,



# bpftool f p k


iproute2, , , ip a s eth0 ip addr show dev eth0.





BPF . , UNIX: , () , . , , BPF, , ABI, , , -.



, , , ( - ), —  , () . , .



, , BPF «» . , , : BPF ( 5.8 30 ), , , , BPF , BPF, — security BPF.





  1. BPF , : classic BPF




  1. BPF and XDP Reference Guide — BPF cilium, Daniel Borkman, BPF. , , Daniel . , BPF XDP TC ip iproute2.



  2. Documentation/networking/filter.txt — , extended BPF. , .



  3. BPF facebook. , , Alexei Starovoitov ( eBPF) Andrii Nakryiko — ( libbpf).



  4. Bpftool的秘密来自Quentin Monnet的有趣的Twitter线程,提供了使用bpftool的示例和秘诀。



  5. 深入了解BPF:阅读材料清单来自Quentin Monnet的BPF文档链接的巨型列表(仍在维护)。






All Articles