小孩子的BPF,第二部分:各种BPF程序类型

继续我们有关BPF(Linux内核的通用虚拟机)的系列文章,在本期中,我们将讨论存在哪些类型的BPF程序以及它们如何在现实的资本主义现金世界中使用。另外,在文章的结尾,有许多链接,特别是指向有关BPF的两本现有书籍的链接。



Linux内核5.9定义了30多种不同的BPF程序类型,我将针对其中的一些类型写几篇文章,因此,这篇文章不可避免地是一个概述,并且没有包含与前几篇文章一样多的技术细节。但是,尽管如此,我们将尝试最终回答以下问题:为什么需要所有这些以及为什么BPF周围会有如此多的噪音。



如果您想知道BPF到底如何有效地解决DDoS攻击防护,服务器负载平衡,kubernetes网络堆栈的实现,系统免受攻击,有效跟踪生产中的24x7系统等诸多问题,那么欢迎您。



图片



节目类型和目录



所有现有的BPF程序类型都在include/uapi/linux/bpf.hLinux内核文件中注册在以下各节中,我尝试将它们分为逻辑组(星号标记为技术教育计划的小节):





BPF .



, - , , BPF_PROG_* .



0975 Alexei\ Starovoitov 2014-09-26 BPF_PROG_TYPE_UNSPEC
ddd8 Alexei Starovoitov 2014-12-01 BPF_PROG_TYPE_SOCKET_FILTER
2541 Alexei Starovoitov 2015-03-25 BPF_PROG_TYPE_KPROBE
96be Daniel Borkmann 2015-03-01 BPF_PROG_TYPE_SCHED_CLS
94ca Daniel Borkmann 2015-03-20 BPF_PROG_TYPE_SCHED_ACT
98b5 Alexei Starovoitov 2016-04-06 BPF_PROG_TYPE_TRACEPOINT
6a77 Brenden Blanco 2016-07-19 BPF_PROG_TYPE_XDP
0515 Alexei Starovoitov 2016-09-01 BPF_PROG_TYPE_PERF_EVENT
0e33 Daniel Mack 2016-11-23 BPF_PROG_TYPE_CGROUP_SKB
6102 David Ahern 2016-12-01 BPF_PROG_TYPE_CGROUP_SOCK
3a0a Thomas Graf 2016-11-30 BPF_PROG_TYPE_LWT_IN
3a0a Thomas Graf 2016-11-30 BPF_PROG_TYPE_LWT_OUT
3a0a Thomas Graf 2016-11-30 BPF_PROG_TYPE_LWT_XMIT
4030 Lawrence Brakmo 2017-06-30 BPF_PROG_TYPE_SOCK_OPS
b005 John Fastabend 2017-08-15 BPF_PROG_TYPE_SK_SKB
ebc6 Roman Gushchin 2017-11-05 BPF_PROG_TYPE_CGROUP_DEVICE
4f73 John Fastabend 2018-03-18 BPF_PROG_TYPE_SK_MSG
c4f6 Alexei Starovoitov 2018-03-28 BPF_PROG_TYPE_RAW_TRACEPOINT
4fba Andrey Ignatov 2018-03-30 BPF_PROG_TYPE_CGROUP_SOCK_ADDR
004d Mathieu\ Xhonneux 2018-05-20 BPF_PROG_TYPE_LWT_SEG6LOCAL
f436 Sean Young 2018-05-27 BPF_PROG_TYPE_LIRC_MODE2
2dbb Martin KaFai Lau 2018-08-08 BPF_PROG_TYPE_SK_REUSEPORT
d58e Petar Penkov 2018-09-14 BPF_PROG_TYPE_FLOW_DISSECTOR
7b14 Andrey Ignatov 2019-02-27 BPF_PROG_TYPE_CGROUP_SYSCTL
9df1 Matt Mullins 2019-04-26 BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE
0d01 Stanislav\ Fomichev 2019-06-27 BPF_PROG_TYPE_CGROUP_SOCKOPT
f1b9 Alexei Starovoitov 2019-10-30 BPF_PROG_TYPE_TRACING
27ae Martin KaFai Lau 2020-01-08 BPF_PROG_TYPE_STRUCT_OPS
be87 Alexei Starovoitov 2020-01-20 BPF_PROG_TYPE_EXT
fc61 KP Singh 2020-03-29 BPF_PROG_TYPE_LSM
e9dd Jakub Sitnicki 2020-07-17 BPF_PROG_TYPE_SK_LOOKUP


Linux



1992 «» (- ) ( ). , «LINUX is obsolete» , Linux ( 1992 ) , . «» , , , :



«, linux , , . , , , . ( ) linux . GNU , : , , . Linux , GNU " "»



, — BPF Linux . 2020 Martin KaFai Lau , . —  - , , - .



BPF: BPF_PROG_TYPE_STRUCT_OPS. , Daniel Borkman , BPF — , .



, BPF. - , . BPF tcp_congestion_ops, TCP congestion control. —  DCTCP CUBIC BPF.



, , BPF, , , (, BPF ) . , , — . . BPF Summit.



BPF



BPF Brendan Gregg, , Linux . bcc, bpftrace, «BPF Performance Tools», , BPF, .. Facebook Netflix, , BPF, 24x7. BPF — BPF .



? . BPF :





maps, , . , BPF, , , .



( bpftrace, ):



#! /usr/bin/env bpftrace

#include <linux/skbuff.h>
#include <linux/ip.h>

k:icmp_echo {
    $skb = (struct sk_buff *) arg0;
    $iphdr = (struct iphdr *) ($skb->head + $skb->network_header);
    @pingstats[ntop($iphdr->saddr), ntop($iphdr->daddr)]++;
}


, . kprobe icmp_echo, ICMPv4 echo request. , arg0 , — sk_buff, . IP @pingstats. , , IP ! , kprobe, user space, .



BPF, tracing:



  • BPF_PROG_TYPE_KPROBE: BPF kprobe, kretprobe, uprobe uretprobe. , (.. ), , , .
  • BPF_PROG_TYPE_PERF_EVENT: BPF perf.
  • BPF_PROG_TYPE_TRACEPOINT: BPF tracepoint. , kprobes? , tracepoints — API ( , / tracepoint ) , tracepoints ( ).
  • BPF_PROG_TYPE_RAW_TRACEPOINT: tracepoints . raw tracepoints BPF «» , ,
  • BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE: tracepoints (. )
  • BPF_PROG_TYPE_TRACING: , : tracepoints, , , ( : sudo cat /sys/kernel/debug/error_injection/list), «». , BTF — .


, —  Linux BPF, .



Linux Security Modules



Linux (security hooks), , , , .. Linux (Linux Security Modules LSM) SELinux, AppArmor, .., .



, . , API Kernel Runtime Security Instrumentation LSS-NA 2019. BPF, BPF_PROG_TYPE_LSM, , BPF LSM . , , BPF, ..



, , BPF, . , BPF. , LSM . user mode helper, BPF libbpf, libbpf .



KRSI KP Singh, KRSI, BPF Summit.



BPF



Tail calls



, , BPF . , BPF 4096 . , BPF . —  . tail calls.



tail calls . , , —  . BPF_MAP_TYPE_PROG_ARRAY, BPF ( ):





- bpf_tail_call. , , bpf_tail_call(&map, ctx, 1), ctx —  , . , long jump, . 32, , .



, 5.1 , , . tail calls, .



, , tail calls . , bpf_tail_call, ? — , .



XDP Features

, XDP .



, tail calls —  XDP features. XDP, , «» XDP . , , «» . , , , . tail calls, , . , , , - , « » — , , , .





tail calls . BPF, BPF_PROG_TYPE_EXT, . BPF trampoline TRACING , —  , .



, , xdp-dispatcher — , XDP . «» -, XDP , - . . Multiple XDP programs on a single interface—status and next steps Toke Høiland-Jørgensen Linux Plumbers 2020.



LIRC: Linux Infrared Remote Control



, BPF BPF_PROG_TYPE_LIRC_MODE2 . lwn Sean Young, , , .



, BPF , / . BPF , , , map. , - , -. , , bpf_rc_keydown :





, , lirc? Sean Young , BPF , , API: IR userspace ( ).



BPF



BPF Berkeley Labs, BPF Linux, , BPF .



«» —  XDP Linux —  - , / Linux.



Linux:



, BPF , , XDP, Linux. . ( , XDP, , .)



, Linux . , DMA , CPU. , , CPU, RAM.





Linux — top half bottom half. — , , (top half), (bottom half), softirq . , bottom halves, , softirq NET_RX.



softirq , struct sk_buff. sk_buff, socket buffer, —  Linux. Linux sk_buff. , : , , -, .., .. , sk_buff .





. , head end , data tail , net_header transport_hdr , , .. data —  «» .



netif_receive_skb . ? Netfilter wiki:





sk_buff , (ingress qdisc ), , netfiler. , (sk_buff) , — .



, ( ). start, softirq, eBPF XDP , ...



— Express Data Path



sk_buff — , , VLAN .. BPF XDP (Express Data Path) sk_buff.



XDP , , , RAM . struct xdp_md, , , . — —  XDP (XDP_DROP), , (XDP_TX), (XDP_REDIRECT), (XDP_PASS):





, / , , , , , MAC , XDP , / .



XDP AF_XDP, , , zero copy. , DPDK, :





AF_XDP: AF_XDP (rx queue), . XDP, . , , , . ( , , , . , UDP 65784 AF_XDP, 13, , , : ethtool -N flow-type udp4 dst-port 65784 action 13.)



, XDP , . , , CPU 0%. Netronome, , .



«» XDP, : DDoS . , Facebook, load balancer katran, XDP, Cloudfare XDP DDoS load balancing, cilium XDP , .. R&D , XDP P4 , , ( NPU — Networking Processing Unit).



XDP , — , . , XDP, XDP Tutorial, , , — kozlyuk .



struct __sk_buff



BPF, . , Linux sk_buff, . len — , network_header — L3, devstruct net_device , .



, — sk_buff ( XDP, sk_buff ), , BPF sk_buff. , —  BPF struct __sk_buff:



struct __sk_buff {
    __u32 len;
    __u32 pkt_type;
    __u32 mark;
    __u32 queue_mapping;
    __u32 protocol;
    __u32 vlan_present;
    ...
};


__sk_buff sk_buff, Verifier , . BPF __sk_buff :



int bpf_prog(struct __sk_buf *ctx)
{
    __u32 len = ctx->len;
    __u32 type = ctx->pkt_type;
    ...
}


Verifier ( ) , sk_buff. , :





, Verifier . , pkt_type, 3, Verifier , .



, / . , , , .



skbuff.c ( -, ):



#include <linux/bpf.h>

__attribute__((section("socket/test")))
int bpf_prog(struct __sk_buff *ctx)
{
    __u32 len = ctx->len;
    __u32 type = ctx->pkt_type;
    return len + type;
}


:



clang -target bpf -O2 skbuff.c -o skbuff.o -c


(, , , ):



mkdir mnt
sudo mount -t bpf none ./mnt
bpftool prog load ./skbuff.o ./mnt/xxx


:



$ llvm-objdump -D ./skbuff.o --section socket/test
  0:    61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0)
  1:    61 10 04 00 00 00 00 00 r0 = *(u32 *)(r1 + 4)
  2:    0f 20 00 00 00 00 00 00 r0 += r2
  3:    95 00 00 00 00 00 00 00 exit


, :



$ sudo bpftool prog dump xlated pinned ./mnt/xxx
   0: (61) r2 = *(u32 *)(r1 +104)
   1: (71) r0 = *(u8 *)(r1 +120)
   2: (54) w0 &= 7
   3: (0f) r0 += r2
   4: (95) exit


Linux



Linux, , , sk_buff, ingress qdisc. , , , egress qdisc — , , / netfilter.



Qdisc queueing discipline Linux — Traffic Control (TC). egress qdisc —  , . , - , .



— classful classless —  . — , . , egress qdisc, pfifo_fast, TOS IPv4 IPv6 ( . lartc 9.2):





— qdisc noqueue, , , .



Classful qdiscs . qdiscs. , . . , , : (classifiers) (actions). , , , , . : u32, flower .. : drop ( ), reclassify ( , , , VLAN tag), ..



, qdiscs , . qdiscs , C ? BPF, BPF_PROG_TYPE_SCHED_CLS BPF_PROG_TYPE_SCHED_ACT, , . , qdisc clsact, egress, ingress, BPF BPF_PROG_TYPE_SCHED_CLS direct action. — BPF — actions, .. .



BPF TC, — BPF Reference Guide Daniel Borkman cilium —  CNI kubernetes, Alibaba Google.



BPF



BPF — BPF , . eBPF cBPF, , eBPF cBPF. , BPF BPF_PROG_TYPE_SOCKET_FILTER SO_ATTACH_BPF. , , CAP_SYS_ADMIN.



BPF BPF_PROG_TYPE_SOCKET_FILTER , :





« » flower



__skb_flow_dissect, , Linux flow dissector — - . , , ingress Linux, flower.



, , ,   . BPF — BPF_PROG_TYPE_FLOW_DISSECTOR, BPF. namespace.



BPF



(cgroups) . , BPF : BPF cgroup. , ( , ) -, . cgroups , , , , - . BPF. , .



BPF_PROG_TYPE_CGROUP_SKB BPF (ingress) (egress) . 1, , 0, . , . , , .. : BPF systemd.



, BPF . , BPF BPF_PROG_TYPE_CGROUP_SOCK , struct sock. sk_bound_dev_if , . bind(2) / .



, , BPF BPF_PROG_TYPE_CGROUP_SOCK_ADDR. bind IP , ( use case : cgroup , , . ). connect, getpeername, getsockname, sendmsg recvmsg. , , , cilium iptables k8s.



BPF_PROG_TYPE_CGROUP_SOCKOPT setsockopt.



BPF_PROG_TYPE_CGROUP_DEVICE cgroupv2 , device cgroupsv1.



BPF_PROG_TYPE_CGROUP_SYSCTL sysctl , , , cgroup .



.



BPF



BPF_PROG_TYPE_SK_SKB . : SOCKMAP, . , , recvmsg, BPF, sk_buff . , Isovalent CNI cilium k8s, Cloudfare, . SOCKMAP — TCP splicing of the future.



BPF_PROG_TYPE_SK_SKB, BPF_PROG_TYPE_SK_MSG , sendmsg sendpage , L7 — , . BPF_PROG_TYPE_SK_SKB, sockmap .



BPF_PROG_TYPE_SK_REUSEPORT , SO_REUSEPORT. BPF, , .



BPF_PROG_TYPE_SK_LOOKUP , . : , IP , , , . namespaces.



, , TCP — BPF_PROG_TYPE_SOCK_OPS. cgroupv2, BPF_PROG_TYPE_CGROUP_SOCKOPT, , .., . , TCP , .



LWT:



, , . . , IPv4- IPv6-, VPN, .



, , . , , , , .





, Linux, : ip link add name ipip0 type ipip... .. 2015 Linux . , , , —  .



, , 2016 , ! ,   BPF :





input , output — , xmit —  . struct __sk_buff, (BPF_OK), (BPF_DROP), (BPF_REDIRECT) , , (BPF_DROP). , xmit —  , .



netlink . - , BPF iproute2, :



ip route add 10.0.0.0/24 encap bpf xmit obj <prog.o> section <section> dev <dev>


<prog.o> — BPF ELF, <section> —  .



2018 , BPF_PROG_TYPE_LWT_SEG6LOCAL, seg6local, . Using SRv6.





BPF: BPF_PROG_TYPE_UNSPEC. , , / . bpf(2) .



, ! 99% , . , BPF Linux BPF_PROG_TYPE_UNSPEC BPF, , , , tcpdump wireshark Linux, .



,



BPF Linux, , - . BPF , , Linux. , BPF Linux.



(, , ) Linux — BPF kprobes, tracepoints perf events, — libbpf, bcc bpftrace.







  1. BPF , : classic BPF
  2. BPF , : extended BPF


2,5





Online-,



关于BPF的文章和报告很多。因此,我们将利用上述Isovalent公司正试图领导使用BPF收集炒作的事实,特别是最近建立了该网站的文档并举行了BPF峰会-关于BPF的小型会议。有趣的事实:上述BPF峰会的参与者选择了一种新的BPF吉祥物“蜜蜂”,并想出了一个易听的名字Ebee:






All Articles