[linux-yocto] [PATCH 1/2] netatop: Initial integration
He Zhe
zhe.he at windriver.com
Sat Apr 14 08:27:14 PDT 2018
On 2018年04月10日 03:19, Bruce Ashfield wrote:
> On 2018-04-05 3:08 PM, Bruce Ashfield wrote:
>> On 2018-03-28 11:25 PM, He Zhe wrote:
>>> This is a net statistics collector for atop from:
>>> http://atoptool.nl/netatop.php
>>> https://atoptool.nl/download/netatop-2.0.tar.gz
>>>
>>> Two changes:
>>> Adjust pathes for netatop.h and netatopversion.h
>>> Add module_init and module_exit
>>
>> Which kernel versions have you tested this on ? Is there a way we could get
>> a README that describes how to build and test it ?
>
> Also, is there a recipe that builds this that I can see/test ?
Sorry for slow reply, I'm on vacation and will handle this asap.
Is this expected to be built as a module via a recipe? Or as a patch/feature in yocto-kernel-cache?
Zhe
>
> Bruce
>
>>
>> Bruce
>>
>>>
>>> Signed-off-by: He Zhe <zhe.he at windriver.com>
>>> ---
>>> drivers/staging/netatop/netatop.c | 1843 ++++++++++++++++++++++++++++++
>>> drivers/staging/netatop/netatop.h | 47 +
>>> drivers/staging/netatop/netatopversion.h | 2 +
>>> 3 files changed, 1892 insertions(+)
>>> create mode 100644 drivers/staging/netatop/netatop.c
>>> create mode 100644 drivers/staging/netatop/netatop.h
>>> create mode 100644 drivers/staging/netatop/netatopversion.h
>>>
>>> diff --git a/drivers/staging/netatop/netatop.c b/drivers/staging/netatop/netatop.c
>>> new file mode 100644
>>> index 000000000000..3baa6520416a
>>> --- /dev/null
>>> +++ b/drivers/staging/netatop/netatop.c
>>> @@ -0,0 +1,1843 @@
>>> +/*
>>> +** This module uses the netfilter interface to maintain statistics
>>> +** about the network traffic per task, on level of thread group
>>> +** and individual thread.
>>> +**
>>> +** General setup
>>> +** -------------
>>> +** Once the module is active, it is called for every packet that is
>>> +** transmitted by a local process and every packet that is received
>>> +** from an interface. Not only the packets that contain the user data
>>> +** are passed but also the TCP related protocol packets (SYN, ACK, ...).
>>> +**
>>> +** When the module discovers a packet for a connection (TCP) or local
>>> +** port (UDP) that is new, it creates a sockinfo structure. As soon as
>>> +** possible the sockinfo struct will be connected to a taskinfo struct
>>> +** that represents the proces or thread that is related to the socket.
>>> +** However, the task can only be determined when a packet is transmitted,
>>> +** i.e. the module is called during system call handling in the context
>>> +** of the transmitting process. At that moment the tgid (process) and
>>> +** pid (thread) can be obtained from the process administration to
>>> +** be stored in the module's own taskinfo structs (one for the process,
>>> +** one for the thread).
>>> +** For the time that the sockinfo struct can not be related to a taskinfo
>>> +** struct (e.g. when only packets are received), counters are maintained
>>> +** temporarily in the sockinfo struct. As soon as a related taskinfo struct
>>> +** is discovered when the task transmits, counters will be maintained in
>>> +** the taskinfo struct itself.
>>> +** When packets are only received for a socket (e.g. another machine is
>>> +** sending UDP packets to the local machine) while the local task
>>> +** never responds, no match to a process can be made and the packets
>>> +** remain unidentified by the netatop module. At least one packet should
>>> +** have been sent by a local process to be able to match packets for such
>>> +** socket.
>>> +** In the file /proc/netatop counters can be found that show the total
>>> +** number of packets sent/received and how many of these packets were
>>> +** unidentified (i.e. not accounted to a process/thread).
>>> +**
>>> +** Garbage collection
>>> +** ------------------
>>> +** The module uses a garbage collector to cleanup the unused sockinfo
>>> +** structs if connections do not exist any more (TCP) or have not been
>>> +** used for some time (TCP/UDP).
>>> +** Furthermore, the garbage collector checks if the taskinfo structs
>>> +** still represent existing processes or threads. If not, the taskinfo struct
>>> +** is destroyed (in case of a thread) or it is moved to a separate list of
>>> +** finished processes (in case of a process). Analysis programs can read
>>> +** the taskinfo of such finished process. When the taskinfo of a finished
>>> +** process is not read within 15 seconds, the taskinfo will be destroyed.
>>> +**
>>> +** A garbage collector cycle can be triggered by issueing a getsockopt
>>> +** call from an analysis program (e.g. atop). Apart from that, a time-based
>>> +** garbage collector cycle is issued anyhow every 15 seconds by the
>>> +** knetatop kernel thread.
>>> +**
>>> +** Interface with user mode
>>> +** ------------------------
>>> +** Programs can open an IP socket and use the getsockopt() system call
>>> +** to issue commands to this module. With the command ATOP_GETCNT_TGID
>>> +** the current counters can be obtained on process level (thread group)
>>> +** and with the command ATOP_GETCNT_PID the counters on thread level.
>>> +** For both commands, the tgid/pid has to be passed of the required thread
>>> +** (group). When the required thread (group) does not exist, an errno ESRCH
>>> +** is given.
>>> +**
>>> +** The command ATOP_GETCNT_EXIT can be issued to obtain the counters of
>>> +** an exited process. As stated above, such command has to be issued
>>> +** within 15 seconds after a process has been declared 'finished' by
>>> +** the garbage collector. Whenever this command is issued and no exited
>>> +** process is in the exitlist, the requesting process is blocked until
>>> +** an exited process is available.
>>> +**
>>> +** The command NETATOP_FORCE_GC activates the garbage collector of the
>>> +** netatop module to determine if sockinfo's of old connections/ports
>>> +** can be destroyed and if taskinfo's of exited processes can be
>>> +** The command NETATOP_EMPTY_EXIT can be issued to wait until the exitlist
>>> +** with the taskinfo's of exited processes is empty.
>>> +** ----------------------------------------------------------------------
>>> +** Copyright (C) 2012 Gerlof Langeveld (gerlof.langeveld at atoptool.nl)
>>> +**
>>> +** This program is free software; you can redistribute it and/or modify
>>> +** it under the terms of the GNU General Public License version 2 as
>>> +** published by the Free Software Foundation.
>>> +*/
>>> +#include <linux/init.h>
>>> +#include <linux/kernel.h>
>>> +#include <linux/module.h>
>>> +#include <linux/netfilter.h>
>>> +#include <linux/netfilter_ipv4.h>
>>> +#include <linux/sched.h>
>>> +#include <linux/skbuff.h>
>>> +#include <linux/types.h>
>>> +#include <linux/file.h>
>>> +#include <linux/kthread.h>
>>> +#include <linux/seq_file.h>
>>> +#include <linux/version.h>
>>> +#include <net/sock.h>
>>> +#include <linux/ip.h>
>>> +#include <linux/tcp.h>
>>> +#include <linux/udp.h>
>>> +#include <linux/proc_fs.h>
>>> +#include <net/ip.h>
>>> +#include <net/tcp.h>
>>> +#include <net/udp.h>
>>> +
>>> +#include "netatop.h"
>>> +#include "netatopversion.h"
>>> +
>>> +MODULE_LICENSE("GPL");
>>> +MODULE_AUTHOR("Gerlof Langeveld <gerlof.langeveld at atoptool.nl>");
>>> +MODULE_DESCRIPTION("Per-task network statistics");
>>> +MODULE_VERSION(NETATOPVERSION);
>>> +
>>> +#define GCINTERVAL (HZ*15) // interval garbage collector (jiffies)
>>> +#define GCMAXUDP (HZ*16) // max inactivity for UDP (jiffies)
>>> +#define GCMAXTCP (HZ*1800) // max inactivity for TCP (jiffies)
>>> +#define GCMAXUNREF (HZ*60) // max time without taskref (jiffies)
>>> +
>>> +#define SILIMIT (4096*1024) // maximum memory for sockinfo structs
>>> +#define TILIMIT (2048*1024) // maximum memory for taskinfo structs
>>> +
>>> +#define NF_IP_PRE_ROUTING 0
>>> +#define NF_IP_LOCAL_IN 1
>>> +#define NF_IP_FORWARD 2
>>> +#define NF_IP_LOCAL_OUT 3
>>> +#define NF_IP_POST_ROUTING 4
>>> +
>>> +/*
>>> +** struct that maintains statistics about the network
>>> +** traffic caused per thread or thread group
>>> +*/
>>> +struct chainer {
>>> + void *next;
>>> + void *prev;
>>> +};
>>> +
>>> +struct taskinfobucket;
>>> +
>>> +struct taskinfo {
>>> + struct chainer ch;
>>> +
>>> + pid_t id; // tgid or pid
>>> + char type; // 'g' (thread group) or
>>> + // 't' (thread)
>>> + unsigned char state; // see below
>>> + char command[COMLEN];
>>> + unsigned long btime; // start time of process
>>> + unsigned long long exittime; // time inserted in exitlist
>>> +
>>> + struct taskcount tc;
>>> +};
>>> +
>>> +static struct kmem_cache *ticache; // taskinfo cache
>>> +
>>> +// state values above
>>> +#define CHECKED 1 // verified that task still exists
>>> +#define INDELETE 2 // task exited but still in hash list
>>> +#define FINISHED 3 // task on exit list
>>> +
>>> +/*
>>> +** hash tables to find a particular thread group or thread
>>> +*/
>>> +#define TBUCKS 1024 // must be multiple of 2!
>>> +#define THASH(x, t) (((x)+t)&(TBUCKS-1))
>>> +
>>> +struct taskinfobucket {
>>> + struct chainer ch;
>>> + spinlock_t lock;
>>> +} thash[TBUCKS];
>>> +
>>> +static unsigned long nrt; // current number of taskinfo allocated
>>> +static unsigned long nrt_ovf; // no taskinfo allocated due to overflow
>>> +static DEFINE_SPINLOCK(nrtlock);
>>> +
>>> +
>>> +static struct taskinfo *exithead; // linked list of exited processes
>>> +static struct taskinfo *exittail;
>>> +static DEFINE_SPINLOCK(exitlock);
>>> +
>>> +static DECLARE_WAIT_QUEUE_HEAD(exitlist_filled);
>>> +static DECLARE_WAIT_QUEUE_HEAD(exitlist_empty);
>>> +
>>> +static unsigned long nre; // current number of taskinfo on exitlist
>>> +
>>> +/*
>>> +** structs that uniquely identify a TCP connection (host endian format)
>>> +*/
>>> +struct tcpv4_ident {
>>> + uint32_t laddr; /* local IP address */
>>> + uint32_t raddr; /* remote IP address */
>>> + uint16_t lport; /* local port number */
>>> + uint16_t rport; /* remote port number */
>>> +};
>>> +
>>> +struct tcpv6_ident {
>>> + struct in6_addr laddr; /* local IP address */
>>> + struct in6_addr raddr; /* remote IP address */
>>> + uint16_t lport; /* local port number */
>>> + uint16_t rport; /* remote port number */
>>> +};
>>> +
>>> +/*
>>> +** struct to maintain the reference from a socket
>>> +** to a thread and thread-group
>>> +*/
>>> +struct sockinfo {
>>> + struct chainer ch;
>>> +
>>> + unsigned char last_state; // last known state of socket
>>> + uint8_t proto; // protocol
>>> +
>>> + union keydef {
>>> + uint16_t udp; // UDP ident (only portnumber)
>>> + struct tcpv4_ident tcp4; // TCP connection ident IPv4
>>> + struct tcpv6_ident tcp6; // TCP connection ident IPv6
>>> + } key;
>>> +
>>> + struct taskinfo *tgp; // ref to thread group
>>> + struct taskinfo *thp; // ref to thread (or NULL)
>>> +
>>> + short tgh; // hash number of thread group
>>> + short thh; // hash number of thread
>>> +
>>> + unsigned int sndpacks; // temporary counters in case
>>> + unsigned int rcvpacks; // known yet
>>> + unsigned long sndbytes; // no relation to process is
>>> + unsigned long rcvbytes;
>>> +
>>> + unsigned long long lastact; // last updated (jiffies)
>>> +};
>>> +
>>> +static struct kmem_cache *sicache; // sockinfo cache
>>> +
>>> +/*
>>> +** hash table to find a socket reference
>>> +*/
>>> +#define SBUCKS 1024 // must be multiple of 2!
>>> +#define SHASHTCP4(x) (((x).raddr+(x).lport+(x).rport)&(SBUCKS-1))
>>> +#define SHASHUDP(x) ((x)&(SBUCKS-1))
>>> +
>>> +struct {
>>> + struct chainer ch;
>>> + spinlock_t lock;
>>> +} shash[SBUCKS];
>>> +
>>> +static unsigned long nrs; // current number sockinfo allocated
>>> +static unsigned long nrs_ovf; // no sockinfo allocated due to overflow
>>> +static DEFINE_SPINLOCK(nrslock);
>>> +
>>> +/*
>>> +** various static counters
>>> +*/
>>> +static unsigned long icmpsndbytes;
>>> +static unsigned long icmpsndpacks;
>>> +static unsigned long icmprcvbytes;
>>> +static unsigned long icmprcvpacks;
>>> +
>>> +static unsigned long tcpsndpacks;
>>> +static unsigned long tcprcvpacks;
>>> +static unsigned long udpsndpacks;
>>> +static unsigned long udprcvpacks;
>>> +static unsigned long unidentudpsndpacks;
>>> +static unsigned long unidentudprcvpacks;
>>> +static unsigned long unidenttcpsndpacks;
>>> +static unsigned long unidenttcprcvpacks;
>>> +
>>> +static unsigned long unknownproto;
>>> +
>>> +static DEFINE_MUTEX(gclock);
>>> +static unsigned long long gclast; // last garbage collection (jiffies)
>>> +
>>> +static struct task_struct *knetatop_task;
>>> +
>>> +static struct timespec boottime;
>>> +
>>> +/*
>>> +** function prototypes
>>> +*/
>>> +static void analyze_tcpv4_packet(struct sk_buff *,
>>> + const struct net_device *, int, char,
>>> + struct iphdr *, void *);
>>> +
>>> +static void analyze_udp_packet(struct sk_buff *,
>>> + const struct net_device *, int, char,
>>> + struct iphdr *, void *);
>>> +
>>> +static int sock2task(char, struct sockinfo *,
>>> + struct taskinfo **, short *,
>>> + struct sk_buff *, const struct net_device *,
>>> + int, char);
>>> +
>>> +static void update_taskcounters(struct sk_buff *,
>>> + const struct net_device *,
>>> + struct taskinfo *, char);
>>> +
>>> +static void update_sockcounters(struct sk_buff *,
>>> + const struct net_device *,
>>> + struct sockinfo *, char);
>>> +
>>> +static void sock2task_sync(struct sk_buff *,
>>> + struct sockinfo *, struct taskinfo *);
>>> +
>>> +static void register_unident(struct sockinfo *);
>>> +
>>> +static int calc_reallen(struct sk_buff *,
>>> + const struct net_device *);
>>> +
>>> +static void get_tcpv4_ident(struct iphdr *, void *,
>>> + char, union keydef *);
>>> +
>>> +static struct sockinfo *find_sockinfo(int, union keydef *, int, int);
>>> +static struct sockinfo *make_sockinfo(int, union keydef *, int, int);
>>> +
>>> +static void wipesockinfo(void);
>>> +static void wipetaskinfo(void);
>>> +static void wipetaskexit(void);
>>> +
>>> +static void garbage_collector(void);
>>> +static void gctaskexit(void);
>>> +static void gcsockinfo(void);
>>> +static void gctaskinfo(void);
>>> +
>>> +static void move_taskinfo(struct taskinfo *);
>>> +static void delete_taskinfo(struct taskinfo *);
>>> +static void delete_sockinfo(struct sockinfo *);
>>> +
>>> +static struct taskinfo *get_taskinfo(pid_t, char);
>>> +
>>> +static int getsockopt(struct sock *, int, void *, int *);
>>> +
>>> +static int netatop_open(struct inode *inode, struct file *file);
>>> +
>>> +/*
>>> +** hook definitions
>>> +*/
>>> +static struct nf_hook_ops hookin_ipv4;
>>> +static struct nf_hook_ops hookout_ipv4;
>>> +
>>> +/*
>>> +** getsockopt definitions for communication with user space
>>> +*/
>>> +static struct nf_sockopt_ops sockopts = {
>>> + .pf = PF_INET,
>>> + .get_optmin = NETATOP_BASE_CTL,
>>> + .get_optmax = NETATOP_BASE_CTL+6,
>>> + .get = getsockopt,
>>> + .owner = THIS_MODULE,
>>> +};
>>> +
>>> +static struct file_operations netatop_proc_fops = {
>>> + .open = netatop_open,
>>> + .read = seq_read,
>>> + .llseek = seq_lseek,
>>> + .release = single_release,
>>> + .owner = THIS_MODULE,
>>> +};
>>> +
>>> +
>>> +/*
>>> +** hook function to be called for every incoming local packet
>>> +*/
>>> +#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 10, 0)
>>> +# if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 4, 0)
>>> +# define HOOK_ARG_TYPE void *priv
>>> +# else
>>> +# define HOOK_ARG_TYPE const struct nf_hook_ops *ops
>>> +# endif
>>> +#else
>>> +# define HOOK_ARG_TYPE unsigned int hooknum
>>> +#endif
>>> +
>>> +#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 1, 0)
>>> +#define HOOK_STATE_ARGS const struct nf_hook_state *state
>>> +
>>> +#define DEV_IN state->in
>>> +#define DEV_OUT state->out
>>> +#else
>>> +# ifndef __GENKSYMS__
>>> +# define HOOK_STATE_ARGS const struct net_device *in, \
>>> + const struct net_device *out, \
>>> + const struct nf_hook_state *state
>>> +# else
>>> +# define HOOK_STATE_ARGS const struct net_device *in, \
>>> + const struct net_device *out, \
>>> + int (*okfn)(struct sk_buff *)
>>> +# endif
>>> +#define DEV_IN in
>>> +#define DEV_OUT out
>>> +#endif
>>> +
>>> +
>>> +static unsigned int
>>> +ipv4_hookin(HOOK_ARG_TYPE, struct sk_buff *skb, HOOK_STATE_ARGS)
>>> +{
>>> + struct iphdr *iph;
>>> + void *trh;
>>> +
>>> + if (skb == NULL) // useless socket buffer?
>>> + return NF_ACCEPT;
>>> +
>>> + /*
>>> + ** get pointer to IP header and transport header
>>> + */
>>> + iph = (struct iphdr *)skb_network_header(skb);
>>> + trh = ((char *)iph + (iph->ihl * 4));
>>> +
>>> + /*
>>> + ** react on protocol number
>>> + */
>>> + switch (iph->protocol) {
>>> + case IPPROTO_TCP:
>>> + tcprcvpacks++;
>>> + analyze_tcpv4_packet(skb, DEV_IN, 0, 'i', iph, trh);
>>> + break;
>>> +
>>> + case IPPROTO_UDP:
>>> + udprcvpacks++;
>>> + analyze_udp_packet(skb, DEV_IN, 0, 'i', iph, trh);
>>> + break;
>>> +
>>> + case IPPROTO_ICMP:
>>> + icmprcvpacks++;
>>> + icmprcvbytes += skb->len + DEV_IN->hard_header_len + 4;
>>> + break;
>>> +
>>> + default:
>>> + unknownproto++;
>>> + }
>>> +
>>> + // accept every packet after stats gathering
>>> + return NF_ACCEPT;
>>> +}
>>> +
>>> +/*
>>> +** hook function to be called for every outgoing local packet
>>> +*/
>>> +static unsigned int
>>> +ipv4_hookout(HOOK_ARG_TYPE, struct sk_buff *skb, HOOK_STATE_ARGS)
>>> +{
>>> + int in_syscall = !in_interrupt();
>>> + struct iphdr *iph;
>>> + void *trh;
>>> +
>>> + if (skb == NULL) // useless socket buffer?
>>> + return NF_ACCEPT;
>>> +
>>> + /*
>>> + ** get pointer to IP header and transport header
>>> + */
>>> + iph = (struct iphdr *)skb_network_header(skb);
>>> + trh = skb_transport_header(skb);
>>> +
>>> + /*
>>> + ** react on protocol number
>>> + */
>>> + switch (iph->protocol) {
>>> + case IPPROTO_TCP:
>>> + tcpsndpacks++;
>>> + analyze_tcpv4_packet(skb, DEV_OUT, in_syscall, 'o', iph, trh);
>>> + break;
>>> +
>>> + case IPPROTO_UDP:
>>> + udpsndpacks++;
>>> + analyze_udp_packet(skb, DEV_OUT, in_syscall, 'o', iph, trh);
>>> + break;
>>> +
>>> + case IPPROTO_ICMP:
>>> + icmpsndpacks++;
>>> + icmpsndbytes += skb->len + DEV_OUT->hard_header_len + 4;
>>> + break;
>>> +
>>> + default:
>>> + unknownproto++;
>>> + }
>>> +
>>> + // accept every packet after stats gathering
>>> + return NF_ACCEPT;
>>> +}
>>> +
>>> +/*
>>> +** generic function (for input and output) to analyze the current packet
>>> +*/
>>> +static void
>>> +analyze_tcpv4_packet(struct sk_buff *skb,
>>> + const struct net_device *ndev, // interface description
>>> + int in_syscall, // called during system call?
>>> + char direction, // incoming ('i') or outgoing ('o')
>>> + struct iphdr *iph, void *trh)
>>> +{
>>> + union keydef key;
>>> + struct sockinfo *sip;
>>> + int bs; // hash bucket for sockinfo
>>> + unsigned long sflags;
>>> +
>>> + /*
>>> + ** determine tcpv4_ident that identifies this TCP packet
>>> + ** and calculate hash bucket in sockinfo hash
>>> + */
>>> + get_tcpv4_ident(iph, trh, direction, &key);
>>> +
>>> + /*
>>> + ** check if we have seen this tcpv4_ident before with a
>>> + ** corresponding thread and thread group
>>> + */
>>> + bs = SHASHTCP4(key.tcp4);
>>> +
>>> + spin_lock_irqsave(&shash[bs].lock, sflags);
>>> +
>>> + if ( (sip = find_sockinfo(IPPROTO_TCP, &key, sizeof key.tcp4, bs))
>>> + == NULL) {
>>> + // no sockinfo yet: create one
>>> + if ( (sip = make_sockinfo(IPPROTO_TCP, &key,
>>> + sizeof key.tcp4, bs)) == NULL) {
>>> + if (direction == 'i')
>>> + unidenttcprcvpacks++;
>>> + else
>>> + unidenttcpsndpacks++;
>>> + goto unlocks;
>>> + }
>>> + }
>>> +
>>> + if (skb->sk)
>>> + sip->last_state = skb->sk->sk_state;
>>> +
>>> + /*
>>> + ** if needed (re)connect the sockinfo to a taskinfo and update
>>> + ** the counters
>>> + */
>>> +
>>> + // connect to thread group and update
>>> + if (sock2task('g', sip, &sip->tgp, &sip->tgh,
>>> + skb, ndev, in_syscall, direction)) {
>>> + // connect to thread and update
>>> + (void) sock2task('t', sip, &sip->thp, &sip->thh,
>>> + skb, ndev, in_syscall, direction);
>>> + }
>>> +
>>> +unlocks:
>>> + spin_unlock_irqrestore(&shash[bs].lock, sflags);
>>> +}
>>> +
>>> +
>>> +/*
>>> +** generic function (for input and output) to analyze the current packet
>>> +*/
>>> +static void
>>> +analyze_udp_packet(struct sk_buff *skb,
>>> + const struct net_device *ndev, // interface description
>>> + int in_syscall, // called during system call?
>>> + char direction, // incoming ('i') or outgoing ('o')
>>> + struct iphdr *iph, void *trh)
>>> +{
>>> + struct udphdr *udph = (struct udphdr *)trh;
>>> + uint16_t udplocal = (direction == 'i' ?
>>> + ntohs(udph->dest) : ntohs(udph->source));
>>> + int bs; // hash bucket for sockinfo
>>> +
>>> + union keydef key;
>>> + struct sockinfo *sip;
>>> + unsigned long sflags;
>>> +
>>> + /*
>>> + ** check if we have seen this local UDP port before with a
>>> + ** corresponding thread and thread group
>>> + */
>>> + key.udp = udplocal;
>>> + bs = SHASHUDP(udplocal);
>>> +
>>> + spin_lock_irqsave(&shash[bs].lock, sflags);
>>> +
>>> + if ( (sip = find_sockinfo(IPPROTO_UDP, &key, sizeof key.udp, bs))
>>> + == NULL) {
>>> + // no sockinfo yet: create one
>>> + if ( (sip = make_sockinfo(IPPROTO_UDP, &key,
>>> + sizeof key.udp, bs)) == NULL) {
>>> + if (direction == 'i')
>>> + unidentudprcvpacks++;
>>> + else
>>> + unidentudpsndpacks++;
>>> + goto unlocks;
>>> + }
>>> + }
>>> +
>>> + /*
>>> + ** if needed (re)connect the sockinfo to a taskinfo and update
>>> + ** the counters
>>> + */
>>> +
>>> + // connect to thread group and update
>>> + if (sock2task('g', sip, &sip->tgp, &sip->tgh,
>>> + skb, ndev, in_syscall, direction)) {
>>> + // connect to thread and update
>>> + (void) sock2task('t', sip, &sip->thp, &sip->thh,
>>> + skb, ndev, in_syscall, direction);
>>> + }
>>> +
>>> +unlocks:
>>> + spin_unlock_irqrestore(&shash[bs].lock, sflags);
>>> +}
>>> +
>>> +/*
>>> +** connect the sockinfo to the correct taskinfo and update the counters
>>> +*/
>>> +static int
>>> +sock2task(char idtype, struct sockinfo *sip, struct taskinfo **tipp,
>>> + short *hash, struct sk_buff *skb, const struct net_device *ndev,
>>> + int in_syscall, char direction)
>>> +{
>>> + pid_t curid;
>>> + unsigned long tflags;
>>> +
>>> + if (*tipp == NULL) {
>>> + /*
>>> + ** no taskinfo connected yet for this reference from
>>> + ** sockinfo; to connect to a taskinfo, we must
>>> + ** be in system call handling now --> verify
>>> + */
>>> + if (!in_syscall) {
>>> + if (idtype == 'g')
>>> + update_sockcounters(skb, ndev, sip, direction);
>>> +
>>> + return 0; // failed
>>> + }
>>> +
>>> + /*
>>> + ** try to find existing taskinfo or create new taskinfo
>>> + */
>>> + curid = (idtype == 'g' ? current->tgid : current->pid);
>>> +
>>> + *hash = THASH(curid, idtype); // calc hashQ
>>> +
>>> + spin_lock_irqsave(&thash[*hash].lock, tflags);
>>> +
>>> + if ( (*tipp = get_taskinfo(curid, idtype)) == NULL) {
>>> + /*
>>> + ** not possible to connect
>>> + */
>>> + spin_unlock_irqrestore(&thash[*hash].lock, tflags);
>>> +
>>> + if (idtype == 'g')
>>> + update_sockcounters(skb, ndev, sip, direction);
>>> +
>>> + return 0; // failed
>>> + }
>>> +
>>> + /*
>>> + ** new connection made:
>>> + ** update task counters with sock counters
>>> + */
>>> + sock2task_sync(skb, sip, *tipp);
>>> + } else {
>>> + /*
>>> + ** already related to thread group or thread
>>> + ** lock existing task
>>> + */
>>> + spin_lock_irqsave(&thash[*hash].lock, tflags);
>>> +
>>> + /*
>>> + ** check if socket has been passed to another process in the
>>> + ** meantime, like programs as xinetd use to do
>>> + ** if so, connect sockinfo to the new task
>>> + */
>>> + if (in_syscall) {
>>> + curid = (idtype == 'g' ? current->tgid : current->pid);
>>> +
>>> + if ((*tipp)->id != curid) {
>>> + spin_unlock_irqrestore(&thash[*hash].lock,
>>> + tflags);
>>> + *hash = THASH(curid, idtype);
>>> +
>>> + spin_lock_irqsave(&thash[*hash].lock, tflags);
>>> +
>>> + if ( (*tipp = get_taskinfo(curid, idtype))
>>> + == NULL) {
>>> + spin_unlock_irqrestore(
>>> + &thash[*hash].lock, tflags);
>>> + return 0;
>>> + }
>>> + }
>>> + }
>>> + }
>>> +
>>> + update_taskcounters(skb, ndev, *tipp, direction);
>>> +
>>> + spin_unlock_irqrestore(&thash[*hash].lock, tflags);
>>> +
>>> + return 1;
>>> +}
>>> +
>>> +/*
>>> +** update the statistics of a particular thread group or thread
>>> +*/
>>> +static void
>>> +update_taskcounters(struct sk_buff *skb, const struct net_device *ndev,
>>> + struct taskinfo *tip, char direction)
>>> +{
>>> + struct iphdr *iph = (struct iphdr *)skb_network_header(skb);
>>> + int reallen = calc_reallen(skb, ndev);
>>> +
>>> + switch (iph->protocol) {
>>> + case IPPROTO_TCP:
>>> + if (direction == 'i') {
>>> + tip->tc.tcprcvpacks++;
>>> + tip->tc.tcprcvbytes += reallen;
>>> + } else {
>>> + tip->tc.tcpsndpacks++;
>>> + tip->tc.tcpsndbytes += reallen;
>>> + }
>>> + break;
>>> +
>>> + case IPPROTO_UDP:
>>> + if (direction == 'i') {
>>> + tip->tc.udprcvpacks++;
>>> + tip->tc.udprcvbytes += reallen;
>>> + } else {
>>> + tip->tc.udpsndpacks++;
>>> + tip->tc.udpsndbytes += reallen;
>>> + }
>>> + }
>>> +}
>>> +
>>> +/*
>>> +** update the statistics of a sockinfo without a connected task
>>> +*/
>>> +static void
>>> +update_sockcounters(struct sk_buff *skb, const struct net_device *ndev,
>>> + struct sockinfo *sip, char direction)
>>> +{
>>> + int reallen = calc_reallen(skb, ndev);
>>> +
>>> + if (direction == 'i') {
>>> + sip->rcvpacks++;
>>> + sip->rcvbytes += reallen;
>>> + } else {
>>> + sip->sndpacks++;
>>> + sip->sndbytes += reallen;
>>> + }
>>> +}
>>> +
>>> +/*
>>> +** add the temporary counters in the sockinfo to the new connected task
>>> +*/
>>> +static void
>>> +sock2task_sync(struct sk_buff *skb, struct sockinfo *sip, struct taskinfo *tip)
>>> +{
>>> + struct iphdr *iph = (struct iphdr *)skb_network_header(skb);
>>> +
>>> + switch (iph->protocol) {
>>> + case IPPROTO_TCP:
>>> + tip->tc.tcprcvpacks += sip->rcvpacks;
>>> + tip->tc.tcprcvbytes += sip->rcvbytes;
>>> + tip->tc.tcpsndpacks += sip->sndpacks;
>>> + tip->tc.tcpsndbytes += sip->sndbytes;
>>> + break;
>>> +
>>> + case IPPROTO_UDP:
>>> + tip->tc.udprcvpacks += sip->rcvpacks;
>>> + tip->tc.udprcvbytes += sip->rcvbytes;
>>> + tip->tc.udpsndpacks += sip->sndpacks;
>>> + tip->tc.udpsndbytes += sip->sndbytes;
>>> + }
>>> +}
>>> +
>>> +static void
>>> +register_unident(struct sockinfo *sip)
>>> +{
>>> + switch (sip->proto) {
>>> + case IPPROTO_TCP:
>>> + unidenttcprcvpacks += sip->rcvpacks;
>>> + unidenttcpsndpacks += sip->sndpacks;
>>> + break;
>>> +
>>> + case IPPROTO_UDP:
>>> + unidentudprcvpacks += sip->rcvpacks;
>>> + unidentudpsndpacks += sip->sndpacks;
>>> + }
>>> +}
>>> +
>>> +/*
>>> +** calculate the number of bytes that are really sent or received
>>> +*/
>>> +static int
>>> +calc_reallen(struct sk_buff *skb, const struct net_device *ndev)
>>> +{
>>> + /*
>>> + ** calculate the real load of this packet on the network:
>>> + **
>>> + ** - length of IP header, TCP/UDP header and data (skb->len)
>>> + **
>>> + ** since packet assembly/disassembly is done by the IP layer
>>> + ** (we get an input packet that has been assembled already and
>>> + ** an output packet that still has to be disassembled), additional
>>> + ** IP headers/interface headers and interface headers have
>>> + ** to be calculated for packets that are larger than the mtu
>>> + **
>>> + ** - interface header length + 4 bytes crc
>>> + */
>>> + int reallen = skb->len;
>>> +
>>> + if (reallen > ndev->mtu)
>>> + reallen += (reallen / ndev->mtu) *
>>> + (sizeof(struct iphdr) + ndev->hard_header_len + 4);
>>> +
>>> + reallen += ndev->hard_header_len + 4;
>>> +
>>> + return reallen;
>>> +}
>>> +
>>> +/*
>>> +** find the tcpv4_ident for the current packet, represented by
>>> +** the skb_buff
>>> +*/
>>> +static void
>>> +get_tcpv4_ident(struct iphdr *iph, void *trh, char direction, union keydef *key)
>>> +{
>>> + struct tcphdr *tcph = (struct tcphdr *)trh;
>>> +
>>> + memset(key, 0, sizeof *key); // important for memcmp later on
>>> +
>>> + /*
>>> + ** determine local/remote IP address and
>>> + ** determine local/remote port number
>>> + */
>>> + switch (direction) {
>>> + case 'i': // incoming packet
>>> + key->tcp4.laddr = ntohl(iph->daddr);
>>> + key->tcp4.raddr = ntohl(iph->saddr);
>>> + key->tcp4.lport = ntohs(tcph->dest);
>>> + key->tcp4.rport = ntohs(tcph->source);
>>> + break;
>>> +
>>> + case 'o': // outgoing packet
>>> + key->tcp4.laddr = ntohl(iph->saddr);
>>> + key->tcp4.raddr = ntohl(iph->daddr);
>>> + key->tcp4.lport = ntohs(tcph->source);
>>> + key->tcp4.rport = ntohs(tcph->dest);
>>> + }
>>> +}
>>> +
>>> +/*
>>> +** search for the sockinfo holding the given address info
>>> +** the appropriate hash bucket must have been locked before calling
>>> +*/
>>> +static struct sockinfo *
>>> +find_sockinfo(int proto, union keydef *identp, int identsz, int hash)
>>> +{
>>> + struct sockinfo *sip = shash[hash].ch.next;
>>> +
>>> + /*
>>> + ** search for appropriate struct
>>> + */
>>> + while (sip != (void *)&shash[hash].ch) {
>>> + if ( memcmp(&sip->key, identp, identsz) == 0 &&
>>> + sip->proto == proto) {
>>> + sip->lastact = jiffies_64;
>>> + return sip;
>>> + }
>>> +
>>> + sip = sip->ch.next;
>>> + }
>>> +
>>> + return NULL; // not existing
>>> +}
>>> +
>>> +/*
>>> +** create a new sockinfo and fill
>>> +** the appropriate hash bucket must have been locked before calling
>>> +*/
>>> +static struct sockinfo *
>>> +make_sockinfo(int proto, union keydef *identp, int identsz, int hash)
>>> +{
>>> + struct sockinfo *sip;
>>> + unsigned long flags;
>>> +
>>> + /*
>>> + ** check if the threshold of memory used for sockinfo structs
>>> + ** is reached to avoid that a fork bomb of processes opening
>>> + ** a socket leads to memory overload
>>> + */
>>> + if ( (nrs+1) * sizeof(struct sockinfo) > SILIMIT) {
>>> + spin_lock_irqsave(&nrslock, flags);
>>> + nrs_ovf++;
>>> + spin_unlock_irqrestore(&nrslock, flags);
>>> + return NULL;
>>> + }
>>> +
>>> + if ( (sip = kmem_cache_alloc(sicache, GFP_ATOMIC)) == NULL)
>>> + return NULL;
>>> +
>>> + spin_lock_irqsave(&nrslock, flags);
>>> + nrs++;
>>> + spin_unlock_irqrestore(&nrslock, flags);
>>> +
>>> + /*
>>> + ** insert new struct in doubly linked list
>>> + */
>>> + memset(sip, '\0', sizeof *sip);
>>> +
>>> + sip->ch.next = &shash[hash].ch;
>>> + sip->ch.prev = shash[hash].ch.prev;
>>> + ((struct sockinfo *)shash[hash].ch.prev)->ch.next = sip;
>>> + shash[hash].ch.prev = sip;
>>> +
>>> + sip->proto = proto;
>>> + sip->lastact = jiffies_64;
>>> + sip->key = *identp;
>>> +
>>> + return sip;
>>> +}
>>> +
>>> +/*
>>> +** search the taskinfo structure holding the info about the given id/type
>>> +** if such taskinfo is not yet present, create a new one
>>> +*/
>>> +static struct taskinfo *
>>> +get_taskinfo(pid_t id, char type)
>>> +{
>>> + int bt = THASH(id, type);
>>> + struct taskinfo *tip = thash[bt].ch.next;
>>> + unsigned long tflags;
>>> +
>>> + /*
>>> + ** search if id exists already
>>> + */
>>> + while (tip != (void *)&thash[bt].ch) {
>>> + if (tip->id == id && tip->type == type)
>>> + return tip;
>>> +
>>> + tip = tip->ch.next;
>>> + }
>>> +
>>> + /*
>>> + ** check if the threshold of memory used for taskinfo structs
>>> + ** is reached to avoid that a fork bomb of processes opening
>>> + ** a socket lead to memory overload
>>> + */
>>> + if ( (nre+nrt+1) * sizeof(struct taskinfo) > TILIMIT) {
>>> + spin_lock_irqsave(&nrtlock, tflags);
>>> + nrt_ovf++;
>>> + spin_unlock_irqrestore(&nrtlock, tflags);
>>> + return NULL;
>>> + }
>>> +
>>> + /*
>>> + ** id not known yet
>>> + ** add new entry to hash list
>>> + */
>>> + if ( (tip = kmem_cache_alloc(ticache, GFP_ATOMIC)) == NULL)
>>> + return NULL;
>>> +
>>> + spin_lock_irqsave(&nrtlock, tflags);
>>> + nrt++;
>>> + spin_unlock_irqrestore(&nrtlock, tflags);
>>> +
>>> + /*
>>> + ** insert new struct in doubly linked list
>>> + ** and fill values
>>> + */
>>> + memset(tip, '\0', sizeof *tip);
>>> +
>>> + tip->ch.next = &thash[bt].ch;
>>> + tip->ch.prev = thash[bt].ch.prev;
>>> + ((struct taskinfo *)thash[bt].ch.prev)->ch.next = tip;
>>> + thash[bt].ch.prev = tip;
>>> +
>>> + tip->id = id;
>>> + tip->type = type;
>>> +
>>> +#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 17, 0)
>>> + tip->btime = div_u64((current->real_start_time +
>>> + (boottime.tv_sec * NSEC_PER_SEC +
>>> + boottime.tv_sec)), NSEC_PER_SEC);
>>> +#else
>>> + // current->real_start_time is type u64
>>> + tip->btime = current->real_start_time.tv_sec + boottime.tv_sec;
>>> +
>>> + if (current->real_start_time.tv_nsec + boottime.tv_nsec > NSEC_PER_SEC)
>>> + tip->btime++;
>>> +#endif
>>> +
>>> + strncpy(tip->command, current->comm, COMLEN);
>>> +
>>> + return tip;
>>> +}
>>> +
>>> +/*
>>> +** garbage collector that removes:
>>> +** - exited tasks that are no longer used by user mode programs
>>> +** - sockinfo's that are not used any more
>>> +** - taskinfo's that do not exist any more
>>> +**
>>> +** a mutex avoids that the garbage collector runs several times in parallel
>>> +**
>>> +** this function may only be called in process context!
>>> +*/
>>> +static void
>>> +garbage_collector(void)
>>> +{
>>> + mutex_lock(&gclock);
>>> +
>>> + if (jiffies_64 < gclast + (HZ/2)) { // maximum 2 GC cycles per second
>>> + mutex_unlock(&gclock);
>>> + return;
>>> + }
>>> +
>>> + gctaskexit(); // remove remaining taskinfo structs from exit list
>>> +
>>> + gcsockinfo(); // clean up sockinfo structs in shash list
>>> +
>>> + gctaskinfo(); // clean up taskinfo structs in thash list
>>> +
>>> + gclast = jiffies_64;
>>> +
>>> + mutex_unlock(&gclock);
>>> +}
>>> +
>>> +/*
>>> +** tasks in the exitlist can be read by a user mode process for a limited
>>> +** amount of time; this function removes all taskinfo structures that have
>>> +** not been read within that period of time
>>> +** notice that exited processes are chained to the tail, so the oldest
>>> +** can be found at the head
>>> +*/
>>> +static void
>>> +gctaskexit()
>>> +{
>>> + unsigned long flags;
>>> + struct taskinfo *tip;
>>> +
>>> + spin_lock_irqsave(&exitlock, flags);
>>> +
>>> + for (tip=exithead; tip;) {
>>> + if (jiffies_64 < tip->exittime + GCINTERVAL)
>>> + break;
>>> +
>>> + // remove taskinfo from exitlist
>>> + exithead = tip->ch.next;
>>> + kmem_cache_free(ticache, tip);
>>> + nre--;
>>> + tip = exithead;
>>> + }
>>> +
>>> + /*
>>> + ** if list empty now, then exithead and exittail both NULL
>>> + ** wakeup waiters for emptylist
>>> + */
>>> + if (nre == 0) {
>>> + exittail = NULL;
>>> + wake_up_interruptible(&exitlist_empty);
>>> + }
>>> +
>>> + spin_unlock_irqrestore(&exitlock, flags);
>>> +}
>>> +
>>> +/*
>>> +** cleanup sockinfo structures that are connected to finished processes
>>> +*/
>>> +static void
>>> +gcsockinfo()
>>> +{
>>> + int i;
>>> + struct sockinfo *sip, *sipsave;
>>> + unsigned long sflags, tflags;
>>> + struct pid *pid;
>>> +
>>> + /*
>>> + ** go through all sockinfo hash buckets
>>> + */
>>> + for (i=0; i < SBUCKS; i++) {
>>> + if (shash[i].ch.next == (void *)&shash[i].ch)
>>> + continue; // quick return without lock
>>> +
>>> + spin_lock_irqsave(&shash[i].lock, sflags);
>>> +
>>> + sip = shash[i].ch.next;
>>> +
>>> + /*
>>> + ** search all sockinfo structs chained in one bucket
>>> + */
>>> + while (sip != (void *)&shash[i].ch) {
>>> + /*
>>> + ** TCP connections that were not in
>>> + ** state ESTABLISHED or LISTEN can be
>>> + ** eliminated
>>> + */
>>> + if (sip->proto == IPPROTO_TCP) {
>>> + switch (sip->last_state) {
>>> + case TCP_ESTABLISHED:
>>> + case TCP_LISTEN:
>>> + break;
>>> +
>>> + default:
>>> + sipsave = sip->ch.next;
>>> + delete_sockinfo(sip);
>>> + sip = sipsave;
>>> + continue;
>>> + }
>>> + }
>>> +
>>> + /*
>>> + ** check if this sockinfo has no relation
>>> + ** for a while with a thread group
>>> + ** if so, delete the sockinfo
>>> + */
>>> + if (sip->tgp == NULL) {
>>> + if (sip->lastact + GCMAXUNREF < jiffies_64) {
>>> + register_unident(sip);
>>> + sipsave = sip->ch.next;
>>> + delete_sockinfo(sip);
>>> + sip = sipsave;
>>> + } else {
>>> + sip = sip->ch.next;
>>> + }
>>> + continue;
>>> + }
>>> +
>>> + /*
>>> + ** check if referred thread group is
>>> + ** already marked as 'indelete' during this
>>> + ** sockinfo search
>>> + ** if so, delete this sockinfo
>>> + */
>>> + spin_lock_irqsave(&thash[sip->tgh].lock, tflags);
>>> +
>>> + if (sip->tgp->state == INDELETE) {
>>> + spin_unlock_irqrestore(&thash[sip->tgh].lock,
>>> + tflags);
>>> + sipsave = sip->ch.next;
>>> + delete_sockinfo(sip);
>>> + sip = sipsave;
>>> + continue;
>>> + }
>>> +
>>> + /*
>>> + ** check if referred thread group still exists;
>>> + ** this step will be skipped if we already verified
>>> + ** the existance of the thread group earlier during
>>> + ** this garbage collection cycle
>>> + */
>>> + if (sip->tgp->state != CHECKED) {
>>> + /*
>>> + ** connected thread group not yet verified
>>> + ** during this cycle, so check if it still
>>> + ** exists
>>> + ** if not, mark the thread group as 'indelete'
>>> + ** (it can not be deleted right now because
>>> + ** we might find other sockinfo's referring
>>> + ** to this thread group during the current
>>> + ** cycle) and delete this sockinfo
>>> + ** if the thread group exists, just mark
>>> + ** it as 'checked' for this cycle
>>> + */
>>> + rcu_read_lock();
>>> + pid = find_vpid(sip->tgp->id);
>>> + rcu_read_unlock();
>>> +
>>> + if (pid == NULL) {
>>> + sip->tgp->state = INDELETE;
>>> + spin_unlock_irqrestore(
>>> + &thash[sip->tgh].lock, tflags);
>>> +
>>> + sipsave = sip->ch.next;
>>> + delete_sockinfo(sip);
>>> + sip = sipsave;
>>> + continue;
>>> + } else {
>>> + sip->tgp->state = CHECKED;
>>> + }
>>> + }
>>> +
>>> + spin_unlock_irqrestore(&thash[sip->tgh].lock, tflags);
>>> +
>>> + /*
>>> + ** check if this sockinfo has a relation with a thread
>>> + ** if not, skip further handling of this sockinfo
>>> + */
>>> + if (sip->thp == NULL) {
>>> + sip = sip->ch.next;
>>> + continue;
>>> + }
>>> +
>>> + /*
>>> + ** check if referred thread is already marked
>>> + ** as 'indelete' during this sockinfo search
>>> + ** if so, break connection
>>> + */
>>> + spin_lock_irqsave(&thash[sip->thh].lock, tflags);
>>> +
>>> + if (sip->thp->state == INDELETE) {
>>> + spin_unlock_irqrestore(&thash[sip->thh].lock,
>>> + tflags);
>>> + sip->thp = NULL;
>>> + sip = sip->ch.next;
>>> + continue;
>>> + }
>>> +
>>> + /*
>>> + ** check if referred thread is already checked
>>> + ** during this sockinfo search
>>> + */
>>> + if (sip->thp->state == CHECKED) {
>>> + spin_unlock_irqrestore(&thash[sip->thh].lock,
>>> + tflags);
>>> + sip = sip->ch.next;
>>> + continue;
>>> + }
>>> +
>>> + /*
>>> + ** connected thread not yet verified
>>> + ** check if it still exists
>>> + ** if not, mark it as 'indelete' and break connection
>>> + ** if thread exists, mark it 'checked'
>>> + */
>>> + rcu_read_lock();
>>> + pid = find_vpid(sip->thp->id);
>>> + rcu_read_unlock();
>>> +
>>> + if (pid == NULL) {
>>> + sip->thp->state = INDELETE;
>>> + sip->thp = NULL;
>>> + } else {
>>> + sip->thp->state = CHECKED;
>>> + }
>>> +
>>> + spin_unlock_irqrestore(&thash[sip->thh].lock, tflags);
>>> +
>>> + /*
>>> + ** check if a TCP port has not been used
>>> + ** for some time --> destroy even if the thread
>>> + ** (group) is still there
>>> + */
>>> + if (sip->proto == IPPROTO_TCP &&
>>> + sip->lastact + GCMAXTCP < jiffies_64) {
>>> + sipsave = sip->ch.next;
>>> + delete_sockinfo(sip);
>>> + sip = sipsave;
>>> + continue;
>>> + }
>>> +
>>> + /*
>>> + ** check if a UDP port has not been used
>>> + ** for some time --> destroy even if the thread
>>> + ** (group) is still there
>>> + ** e.g. outgoing DNS requests (to remote port 53) are
>>> + ** issued every time with another source port being
>>> + ** a new object that should not be kept too long;
>>> + ** local well-known ports are useful to keep
>>> + */
>>> + if (sip->proto == IPPROTO_UDP &&
>>> + sip->lastact + GCMAXUDP < jiffies_64 &&
>>> + sip->key.udp > 1024) {
>>> + sipsave = sip->ch.next;
>>> + delete_sockinfo(sip);
>>> + sip = sipsave;
>>> + continue;
>>> + }
>>> +
>>> + sip = sip->ch.next;
>>> + }
>>> +
>>> + spin_unlock_irqrestore(&shash[i].lock, sflags);
>>> + }
>>> +}
>>> +
>>> +/*
>>> +** remove taskinfo structures of finished tasks from hash list
>>> +*/
>>> +static void
>>> +gctaskinfo()
>>> +{
>>> + int i;
>>> + struct taskinfo *tip, *tipsave;
>>> + unsigned long tflags;
>>> + struct pid *pid;
>>> +
>>> + /*
>>> + ** go through all taskinfo hash buckets
>>> + */
>>> + for (i=0; i < TBUCKS; i++) {
>>> + if (thash[i].ch.next == (void *)&thash[i].ch)
>>> + continue; // quick return without lock
>>> +
>>> + spin_lock_irqsave(&thash[i].lock, tflags);
>>> +
>>> + tip = thash[i].ch.next;
>>> +
>>> + /*
>>> + ** check all taskinfo structs chained to this bucket
>>> + */
>>> + while (tip != (void *)&thash[i].ch) {
>>> + switch (tip->state) {
>>> + /*
>>> + ** remove INDELETE tasks from the hash buckets
>>> + ** -- move thread group to exitlist
>>> + ** -- destroy thread right away
>>> + */
>>> + case INDELETE:
>>> + tipsave = tip->ch.next;
>>> +
>>> + if (tip->type == 'g')
>>> + move_taskinfo(tip); // thread group
>>> + else
>>> + delete_taskinfo(tip); // thread
>>> +
>>> + tip = tipsave;
>>> + break;
>>> +
>>> + case CHECKED:
>>> + tip->state = 0;
>>> + tip = tip->ch.next;
>>> + break;
>>> +
>>> + default: // not checked yet
>>> + rcu_read_lock();
>>> + pid = find_vpid(tip->id);
>>> + rcu_read_unlock();
>>> +
>>> + if (pid == NULL) {
>>> + tipsave = tip->ch.next;
>>> +
>>> + if (tip->type == 'g')
>>> + move_taskinfo(tip);
>>> + else
>>> + delete_taskinfo(tip);
>>> +
>>> + tip = tipsave;
>>> + } else {
>>> + tip = tip->ch.next;
>>> + }
>>> + }
>>> + }
>>> +
>>> + spin_unlock_irqrestore(&thash[i].lock, tflags);
>>> + }
>>> +}
>>> +
>>> +
>>> +/*
>>> +** remove all sockinfo structs
>>> +*/
>>> +static void
>>> +wipesockinfo()
>>> +{
>>> + struct sockinfo *sip, *sipsave;
>>> + int i;
>>> + unsigned long sflags;
>>> +
>>> + for (i=0; i < SBUCKS; i++) {
>>> + spin_lock_irqsave(&shash[i].lock, sflags);
>>> +
>>> + sip = shash[i].ch.next;
>>> +
>>> + /*
>>> + ** free all structs chained in one bucket
>>> + */
>>> + while (sip != (void *)&shash[i].ch) {
>>> + sipsave = sip->ch.next;
>>> + delete_sockinfo(sip);
>>> + sip = sipsave;
>>> + }
>>> +
>>> + spin_unlock_irqrestore(&shash[i].lock, sflags);
>>> + }
>>> +}
>>> +
>>> +/*
>>> +** remove all taskinfo structs from hash list
>>> +*/
>>> +static void
>>> +wipetaskinfo()
>>> +{
>>> + struct taskinfo *tip, *tipsave;
>>> + int i;
>>> + unsigned long tflags;
>>> +
>>> + for (i=0; i < TBUCKS; i++) {
>>> + spin_lock_irqsave(&thash[i].lock, tflags);
>>> +
>>> + tip = thash[i].ch.next;
>>> +
>>> + /*
>>> + ** free all structs chained in one bucket
>>> + */
>>> + while (tip != (void *)&thash[i].ch) {
>>> + tipsave = tip->ch.next;
>>> + delete_taskinfo(tip);
>>> + tip = tipsave;
>>> + }
>>> +
>>> + spin_unlock_irqrestore(&thash[i].lock, tflags);
>>> + }
>>> +}
>>> +
>>> +/*
>>> +** remove all taskinfo structs from exit list
>>> +*/
>>> +static void
>>> +wipetaskexit()
>>> +{
>>> + gctaskexit();
>>> +}
>>> +
>>> +/*
>>> +** move one taskinfo struct from hash bucket to exitlist
>>> +*/
>>> +static void
>>> +move_taskinfo(struct taskinfo *tip)
>>> +{
>>> + unsigned long flags;
>>> +
>>> + /*
>>> + ** remove from hash list
>>> + */
>>> + ((struct taskinfo *)tip->ch.next)->ch.prev = tip->ch.prev;
>>> + ((struct taskinfo *)tip->ch.prev)->ch.next = tip->ch.next;
>>> +
>>> + spin_lock_irqsave(&nrtlock, flags);
>>> + nrt--;
>>> + spin_unlock_irqrestore(&nrtlock, flags);
>>> +
>>> + /*
>>> + ** add to exitlist
>>> + */
>>> + tip->ch.next = NULL;
>>> + tip->state = FINISHED;
>>> + tip->exittime = jiffies_64;
>>> +
>>> + spin_lock_irqsave(&exitlock, flags);
>>> +
>>> + if (exittail) { // list filled?
>>> + exittail->ch.next = tip;
>>> + exittail = tip;
>>> + } else { // list empty
>>> + exithead = exittail = tip;
>>> + }
>>> +
>>> + nre++;
>>> +
>>> + wake_up_interruptible(&exitlist_filled);
>>> +
>>> + spin_unlock_irqrestore(&exitlock, flags);
>>> +}
>>> +
>>> +/*
>>> +** remove one taskinfo struct for the hash bucket chain
>>> +*/
>>> +static void
>>> +delete_taskinfo(struct taskinfo *tip)
>>> +{
>>> + unsigned long flags;
>>> +
>>> + ((struct taskinfo *)tip->ch.next)->ch.prev = tip->ch.prev;
>>> + ((struct taskinfo *)tip->ch.prev)->ch.next = tip->ch.next;
>>> +
>>> + kmem_cache_free(ticache, tip);
>>> +
>>> + spin_lock_irqsave(&nrtlock, flags);
>>> + nrt--;
>>> + spin_unlock_irqrestore(&nrtlock, flags);
>>> +}
>>> +
>>> +/*
>>> +** remove one sockinfo struct for the hash bucket chain
>>> +*/
>>> +static void
>>> +delete_sockinfo(struct sockinfo *sip)
>>> +{
>>> + unsigned long flags;
>>> +
>>> + ((struct sockinfo *)sip->ch.next)->ch.prev = sip->ch.prev;
>>> + ((struct sockinfo *)sip->ch.prev)->ch.next = sip->ch.next;
>>> +
>>> + kmem_cache_free(sicache, sip);
>>> +
>>> + spin_lock_irqsave(&nrslock, flags);
>>> + nrs--;
>>> + spin_unlock_irqrestore(&nrslock, flags);
>>> +}
>>> +
>>> +/*
>>> +** read function for /proc/netatop
>>> +*/
>>> +static int
>>> +netatop_show(struct seq_file *m, void *v)
>>> +{
>>> + seq_printf(m, "tcpsndpacks: %12lu (unident: %9lu)\n"
>>> + "tcprcvpacks: %12lu (unident: %9lu)\n"
>>> + "udpsndpacks: %12lu (unident: %9lu)\n"
>>> + "udprcvpacks: %12lu (unident: %9lu)\n\n"
>>> + "icmpsndpacks: %12lu\n"
>>> + "icmprcvpacks: %12lu\n\n"
>>> + "#sockinfo: %12lu (overflow: %8lu)\n"
>>> + "#taskinfo: %12lu (overflow: %8lu)\n"
>>> + "#taskexit: %12lu\n\n"
>>> + "modversion: %14s\n",
>>> + tcpsndpacks, unidenttcpsndpacks,
>>> + tcprcvpacks, unidenttcprcvpacks,
>>> + udpsndpacks, unidentudpsndpacks,
>>> + udprcvpacks, unidentudprcvpacks,
>>> + icmpsndpacks, icmprcvpacks,
>>> + nrs, nrs_ovf,
>>> + nrt, nrt_ovf,
>>> + nre, NETATOPVERSION);
>>> + return 0;
>>> +}
>>> +
>>> +static int
>>> +netatop_open(struct inode *inode, struct file *file)
>>> +{
>>> + return single_open(file, netatop_show, NULL);
>>> +}
>>> +
>>> +/*
>>> +** called when user spce issues system call getsockopt()
>>> +*/
>>> +static int
>>> +getsockopt(struct sock *sk, int cmd, void __user *user, int *len)
>>> +{
>>> + int bt;
>>> + struct taskinfo *tip;
>>> + char tasktype = 't';
>>> + struct netpertask npt;
>>> + unsigned long tflags;
>>> +
>>> + /*
>>> + ** verify the proper privileges
>>> + */
>>> + if (!capable(CAP_NET_ADMIN))
>>> + return -EPERM;
>>> +
>>> + /*
>>> + ** react on command
>>> + */
>>> + switch (cmd) {
>>> + case NETATOP_PROBE:
>>> + break;
>>> +
>>> + case NETATOP_FORCE_GC:
>>> + garbage_collector();
>>> + break;
>>> +
>>> + case NETATOP_EMPTY_EXIT:
>>> + while (nre > 0) {
>>> + if (wait_event_interruptible(exitlist_empty, nre == 0))
>>> + return -ERESTARTSYS;
>>> + }
>>> + break;
>>> +
>>> + case NETATOP_GETCNT_EXIT:
>>> + if (nre == 0)
>>> + wake_up_interruptible(&exitlist_empty);
>>> +
>>> + if (*len < sizeof(pid_t))
>>> + return -EINVAL;
>>> +
>>> + if (*len > sizeof npt)
>>> + *len = sizeof npt;
>>> +
>>> + spin_lock_irqsave(&exitlock, tflags);
>>> +
>>> + /*
>>> + ** check if an exited process is present
>>> + ** if not, wait for it...
>>> + */
>>> + while (nre == 0) {
>>> + spin_unlock_irqrestore(&exitlock, tflags);
>>> +
>>> + if ( wait_event_interruptible(exitlist_filled, nre > 0))
>>> + return -ERESTARTSYS;
>>> +
>>> + spin_lock_irqsave(&exitlock, tflags);
>>> + }
>>> +
>>> + /*
>>> + ** get first eprocess from exitlist and remove it from there
>>> + */
>>> + tip = exithead;
>>> +
>>> + if ( (exithead = tip->ch.next) == NULL)
>>> + exittail = NULL;
>>> +
>>> + nre--;
>>> +
>>> + spin_unlock_irqrestore(&exitlock, tflags);
>>> +
>>> + /*
>>> + ** pass relevant info to user mode
>>> + ** and free taskinfo struct
>>> + */
>>> + npt.id = tip->id;
>>> + npt.tc = tip->tc;
>>> + npt.btime = tip->btime;
>>> + memcpy(npt.command, tip->command, COMLEN);
>>> +
>>> + if (copy_to_user(user, &npt, *len) != 0)
>>> + return -EFAULT;
>>> +
>>> + kmem_cache_free(ticache, tip);
>>> +
>>> + return 0;
>>> +
>>> + case NETATOP_GETCNT_TGID:
>>> + tasktype = 'g';
>>> +
>>> + case NETATOP_GETCNT_PID:
>>> + if (*len < sizeof(pid_t))
>>> + return -EINVAL;
>>> +
>>> + if (*len > sizeof npt)
>>> + *len = sizeof npt;
>>> +
>>> + if (copy_from_user(&npt, user, *len) != 0)
>>> + return -EFAULT;
>>> +
>>> + /*
>>> + ** search requested id in taskinfo hash
>>> + */
>>> + bt = THASH(npt.id, tasktype); // calculate hash
>>> +
>>> + if (thash[bt].ch.next == (void *)&thash[bt].ch)
>>> + return -ESRCH; // quick return without lock
>>> +
>>> + spin_lock_irqsave(&thash[bt].lock, tflags);
>>> +
>>> + tip = thash[bt].ch.next;
>>> +
>>> + while (tip != (void *)&thash[bt].ch) {
>>> + // is this the one?
>>> + if (tip->id == npt.id && tip->type == tasktype) {
>>> + /*
>>> + ** found: copy results to user space
>>> + */
>>> + memcpy(npt.command, tip->command, COMLEN);
>>> + npt.tc = tip->tc;
>>> + npt.btime = tip->btime;
>>> +
>>> + spin_unlock_irqrestore(&thash[bt].lock, tflags);
>>> +
>>> + if (copy_to_user(user, &npt, *len) != 0)
>>> + return -EFAULT;
>>> + else
>>> + return 0;
>>> + }
>>> +
>>> + tip = tip->ch.next;
>>> + }
>>> +
>>> + spin_unlock_irqrestore(&thash[bt].lock, tflags);
>>> + return -ESRCH;
>>> +
>>> + default:
>>> + printk(KERN_INFO "unknown getsockopt command %d\n", cmd);
>>> + return -EINVAL;
>>> + }
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +/*
>>> +** kernel mode thread: initiate garbage collection every N seconds
>>> +*/
>>> +static int netatop_thread(void *dummy)
>>> +{
>>> + while (!kthread_should_stop()) {
>>> + /*
>>> + ** do garbage collection
>>> + */
>>> + garbage_collector();
>>> +
>>> + /*
>>> + ** wait a while
>>> + */
>>> + (void) schedule_timeout_interruptible(GCINTERVAL);
>>> + }
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 13, 0)
>>> +static int __net_init netatop_nf_register(struct net *net)
>>> +{
>>> + int err;
>>> +
>>> + err = nf_register_net_hooks(net, &hookin_ipv4, 1);
>>> + if (err)
>>> + return err;
>>> +
>>> + err = nf_register_net_hooks(net, &hookout_ipv4, 1);
>>> + if (err)
>>> + nf_unregister_net_hooks(net, &hookin_ipv4, 1);
>>> +
>>> + return err;
>>> +}
>>> +
>>> +static void __net_exit netatop_nf_unregister(struct net *net)
>>> +{
>>> + nf_unregister_net_hooks(net, &hookin_ipv4, 1);
>>> + nf_unregister_net_hooks(net, &hookout_ipv4, 1);
>>> +}
>>> +
>>> +static struct pernet_operations netatop_net_ops = {
>>> + .init = netatop_nf_register,
>>> + .exit = netatop_nf_unregister,
>>> +};
>>> +#endif
>>> +
>>> +/*
>>> +** called when module loaded
>>> +*/
>>> +int
>>> +init_module()
>>> +{
>>> + int i;
>>> +
>>> + /*
>>> + ** initialize caches for taskinfo and sockinfo
>>> + */
>>> + ticache = kmem_cache_create("Netatop_taskinfo",
>>> + sizeof (struct taskinfo), 0, 0, NULL);
>>> + if (!ticache)
>>> + return -EFAULT;
>>> +
>>> + sicache = kmem_cache_create("Netatop_sockinfo",
>>> + sizeof (struct sockinfo), 0, 0, NULL);
>>> + if (!sicache) {
>>> + kmem_cache_destroy(ticache);
>>> + return -EFAULT;
>>> + }
>>> +
>>> + /*
>>> + ** initialize hash table for taskinfo and sockinfo
>>> + */
>>> + for (i=0; i < TBUCKS; i++) {
>>> + thash[i].ch.next = &thash[i].ch;
>>> + thash[i].ch.prev = &thash[i].ch;
>>> + spin_lock_init(&thash[i].lock);
>>> + }
>>> +
>>> + for (i=0; i < SBUCKS; i++) {
>>> + shash[i].ch.next = &shash[i].ch;
>>> + shash[i].ch.prev = &shash[i].ch;
>>> + spin_lock_init(&shash[i].lock);
>>> + }
>>> +
>>> + getboottime(&boottime);
>>> +
>>> + /*
>>> + ** register getsockopt for user space communication
>>> + */
>>> + if (nf_register_sockopt(&sockopts) < 0) {
>>> + kmem_cache_destroy(ticache);
>>> + kmem_cache_destroy(sicache);
>>> + return -1;
>>> + }
>>> +
>>> + /*
>>> + ** create a new kernel mode thread for time-driven garbage collection
>>> + ** after creation, the thread waits until it is woken up
>>> + */
>>> + knetatop_task = kthread_create(netatop_thread, NULL, "knetatop");
>>> +
>>> + if (IS_ERR(knetatop_task)) {
>>> + nf_unregister_sockopt(&sockopts);
>>> + kmem_cache_destroy(ticache);
>>> + kmem_cache_destroy(sicache);
>>> + return -1;
>>> + }
>>> +
>>> + /*
>>> + ** prepare hooks for input and output packets
>>> + */
>>> + hookin_ipv4.hooknum = NF_IP_LOCAL_IN; // input packs
>>> + hookin_ipv4.hook = ipv4_hookin; // func to call
>>> + hookin_ipv4.pf = PF_INET; // IPV4 packets
>>> + hookin_ipv4.priority = NF_IP_PRI_FIRST; // highest prio
>>> +
>>> + hookout_ipv4.hooknum = NF_IP_LOCAL_OUT; // output packs
>>> + hookout_ipv4.hook = ipv4_hookout; // func to call
>>> + hookout_ipv4.pf = PF_INET; // IPV4 packets
>>> + hookout_ipv4.priority = NF_IP_PRI_FIRST; // highest prio
>>> +
>>> +#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 13, 0)
>>> + i = register_pernet_subsys(&netatop_net_ops);
>>> + if (i)
>>> + return i;
>>> +#else
>>> + nf_register_hook(&hookin_ipv4); // register hook
>>> + nf_register_hook(&hookout_ipv4); // register hook
>>> +#endif
>>> +
>>> + /*
>>> + ** create a /proc-entry to produce status-info on request
>>> + */
>>> + proc_create("netatop", 0444, NULL, &netatop_proc_fops);
>>> +
>>> + /*
>>> + ** all admi prepared; kick off kernel mode thread
>>> + */
>>> + wake_up_process(knetatop_task);
>>> +
>>> + return 0; // return success
>>> +}
>>> +
>>> +/*
>>> +** called when module unloaded
>>> +*/
>>> +void
>>> +cleanup_module()
>>> +{
>>> + /*
>>> + ** tell kernel daemon to stop
>>> + */
>>> + kthread_stop(knetatop_task);
>>> +
>>> + /*
>>> + ** unregister netfilter hooks and other miscellaneous stuff
>>> + */
>>> +#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 13, 0)
>>> + unregister_pernet_subsys(&netatop_net_ops);
>>> +#else
>>> + nf_unregister_hook(&hookin_ipv4);
>>> + nf_unregister_hook(&hookout_ipv4);
>>> +#endif
>>> +
>>> + remove_proc_entry("netatop", NULL);
>>> +
>>> + nf_unregister_sockopt(&sockopts);
>>> +
>>> + /*
>>> + ** destroy allocated stats
>>> + */
>>> + wipesockinfo();
>>> + wipetaskinfo();
>>> + wipetaskexit();
>>> +
>>> + /*
>>> + ** destroy caches
>>> + */
>>> + kmem_cache_destroy(ticache);
>>> + kmem_cache_destroy(sicache);
>>> +}
>>> +
>>> +module_init(init_module);
>>> +module_exit(cleanup_module);
>>> diff --git a/drivers/staging/netatop/netatop.h b/drivers/staging/netatop/netatop.h
>>> new file mode 100644
>>> index 000000000000..a14e6fa5db52
>>> --- /dev/null
>>> +++ b/drivers/staging/netatop/netatop.h
>>> @@ -0,0 +1,47 @@
>>> +#define COMLEN 16
>>> +
>>> +struct taskcount {
>>> + unsigned long long tcpsndpacks;
>>> + unsigned long long tcpsndbytes;
>>> + unsigned long long tcprcvpacks;
>>> + unsigned long long tcprcvbytes;
>>> +
>>> + unsigned long long udpsndpacks;
>>> + unsigned long long udpsndbytes;
>>> + unsigned long long udprcvpacks;
>>> + unsigned long long udprcvbytes;
>>> +
>>> + /* space for future extensions */
>>> +};
>>> +
>>> +struct netpertask {
>>> + pid_t id; // tgid or tid (depending on command)
>>> + unsigned long btime;
>>> + char command[COMLEN];
>>> +
>>> + struct taskcount tc;
>>> +};
>>> +
>>> +
>>> +/*
>>> +** getsocktop commands
>>> +*/
>>> +#define NETATOP_BASE_CTL 15661
>>> +
>>> +// just probe if the netatop module is active
>>> +#define NETATOP_PROBE (NETATOP_BASE_CTL)
>>> +
>>> +// force garbage collection to make finished processes available
>>> +#define NETATOP_FORCE_GC (NETATOP_BASE_CTL+1)
>>> +
>>> +// wait until all finished processes are read (blocks until done)
>>> +#define NETATOP_EMPTY_EXIT (NETATOP_BASE_CTL+2)
>>> +
>>> +// get info for finished process (blocks until available)
>>> +#define NETATOP_GETCNT_EXIT (NETATOP_BASE_CTL+3)
>>> +
>>> +// get counters for thread group (i.e. process): input is 'id' (pid)
>>> +#define NETATOP_GETCNT_TGID (NETATOP_BASE_CTL+4)
>>> +
>>> +// get counters for thread: input is 'id' (tid)
>>> +#define NETATOP_GETCNT_PID (NETATOP_BASE_CTL+5)
>>> diff --git a/drivers/staging/netatop/netatopversion.h b/drivers/staging/netatop/netatopversion.h
>>> new file mode 100644
>>> index 000000000000..eb011f5a13c9
>>> --- /dev/null
>>> +++ b/drivers/staging/netatop/netatopversion.h
>>> @@ -0,0 +1,2 @@
>>> +#define NETATOPVERSION "2.0"
>>> +#define NETATOPDATE "2017/09/30 12:06:19"
>>>
>>
>
>
More information about the linux-yocto
mailing list