Summary of the netfilter developer workshop 2003

Harald Welte

   $Revision: 1.1 $

   Copyright © 2003 Harald Welte <laforge@netfilter.org>
     _________________________________________________________

   Table of Contents
   1. Introduction
   2. Topics

        2.1. pkttables
        2.2. Wallfire
        2.3. Conntrack Failover
        2.4. Logging Framework
        2.5. proc patch
        2.6. raw table
        2.7. TCP window tracking
        2.8. nf-hipac
        2.9. bridgewalling
        2.10. Distributions of work within the project
        2.11. test tools

              2.11.1. regression testing
              2.11.2. benchmarking

        2.12. GPL Violations
        2.13. patch-o-matic

   3. TODO list for 2.6.0-testX

1. Introduction

   foo
     _________________________________________________________

2. Topics

2.1. pkttables

   pkttables is the "next generation" packet filter for linux
   2.6.x and beyond.

   pkttables is a whole framework for layer3 independent
   management of packet filtering rules. The idea is to reduce
   the code replication between ip_tables.c, ip6_tables.c,
   arp_tables.c on the kernel side, as well as the userspace
   counterpart (libip4/6tc, iptables.c/ip6tables.c). They all
   have to deal with the same set of problems: communicating
   packet filter rules between userspace and kernel,
   add/remove/replace/insert rules into existing rulesets, and
   matching a particular packet against a ruleset. Especially the
   management functions like ruleset loading from/to userspace is
   not dependent on the l3 protocol at all - but still we
   replicate this functionality in the current 2.4.x packet
   filter(s).

   The major parts of pkttables are

    1. pkt_tables - the in-kernel pkttables core
    2. pkt_tables_ipv4 - the ipv4 incarnation of pkt_tables
    3. pkttable_ipv4_filter - the ipv4 filter table instance
    4. pkttnetlink - the in-kernel part of a netlink-based
       userspace interface
    5. pktt_foo.c - the in-kernel part of a match extension
    6. pktt_FOO.c - the in-kernel part of a target extension
    7. libpkttnetlink - userspace library for low-layer
       communication
    8. libpkttables - userspace library providing high-layer API
       for apps
    9. pkttables - commandline program for ruleset manipulation
       by the admin

   During the workshop, the following requirements/comments on
   the current pkttables design were made:

    1. Use TLV's for the data protocol used in all netlink
       messages. This ensures future compatibility with netlink2
       (see the related IETF forces draft), and allows
       applications like saving/restoring iptables rules in
       binary architecture-independent form. IT also introduces a
       clear distinction between the in-kernel representation of
       [matchinfo] data - and the communication structures used
       between kernel and userspace. As a by-product it solves
       any kernel64/user32 issues on SPARC64 and other similar
       architectures. Also, versioning can easily be implemented,
       new fields can easily be added, ...
    2. Transactions. The goal is to make everything transaction
       based. However, full support for an arbitrary number of
       transactions is difficult to implement. As a minimum
       requirement: Only one open transaction per time. Once the
       transaction is started, all operations are performed on a
       copy of the ruleset. Once the transaction is committed,
       the rulesets are atomically replaced.
    3. Versioning. Versioning is handled via sub-tlv's of the
       matchinfo tlv. Every plugin can define it's own sub-TLV's
       within the matchinfo TLV. This way, they can invent new
       TLV's on their own. A new version of a particular plugin
       would introduce a new Tag. However, both kernel and
       userspace still support the old tag. To find out, which
       tags are supported on both sides, a discovery mechanism
       needs to be in place.
    4. libpkttables core can do all the encoding if plugin
       specifies the tag for a given parameter. pro: easy. con:
       how to register different tlvs depending on kernel
       version. idea: include min/max
    5. matchinfo tlv tags are allocated at patch-o-matic
       inclusion time. This way we have a central way of
       assigning the numbers and assuring no overlap (NANANA). We
       also specify a range of experimental tags for external
       modules or modules that are not included in patch-o-matic
       yet.
    6. i18n. Since libpkttables returns also the
       descriptions/help messages for individual plugins, support
       for i18n should be included in the architecture
    7. manpage generation. It was noted that this could be done
       automatically, if all the data is hidden within
       libpkttables.
    8. seperate target/policy. This makes it explicit on whether
       a target terminates or continues traversal within the
       chain/table. Currently we have terminating and
       non-terminating targets. Apart from documentation and/or
       experience of the user, there is no way of telling them
       apart. In pkttables, every target tells the core which
       policies (PKTT_ACCEPT, PKTT_DROP, PKTT_CONTINUE, ...) it
       supports. The user then has to specify a '-P ACCEPT'
       option on every rule in order to specify the desired
       behaviour.

   As for kernel inclusion, it was decided that pkttables will be
   included as additional netfilter subsystem (in parallel to the
   current iptables code) during the early 2.7.x kernel series -
   and then backported into 2.6.current at that time.
     _________________________________________________________

2.2. Wallfire

   Herve Eychenne gave a presentation about the wallfire project.
   Please refer to his slides and/or the wallfire documentation
   to learn more about it. This summarry will just cover the
   issues related to netfilter/iptables development.

     * verbose error reporting (netlink messages)
     * slow rule manipulation. Incremental changes to the ruleset
       are way too slow. This should be improved with
       iptables-1.3.x (including libiptc2) and the new
       mark_source_chains() implementation. A real 'fix' is only
       an incremental kernel/userspace interface, like pkttables
       will introduce.
     * converter to convert shell script into iptables-save
       format (shellscript?)
     * MASQUERADE: don't flush at ifdown, but at ifup (and cmp to
       old address). The current flush-at-ifdown policy deletes
       all NAT mappings, even if our ISP gives us the same IP
       again.
     _________________________________________________________

2.3. Conntrack Failover

   Krisztian Korvacs did an experimental implementation of the
   netfilter failover solution described in the OLS2002 paper by
   Harald Welte. The implementation should be considered as
   proof-of-concept implementation. The code has not been
   released yet, and has not experienced any testing besides a
   simulated UML environment.

   The author and Harald will work together on improving the code
   and to give it some testing on physical machines. If needed,
   Astaro would be happy to provide the testlab and pay for
   travel costs.
     _________________________________________________________

2.4. Logging Framework

   Jozsef implemented a more general abstraction for a logging
   interface. This logging interface looks a bit like the current
   queue handler. Any module can register a logging handler with
   the core, and everybody who wants to log a packet just calls
   the logging handler. This way, packets can be logged from
   outside the LOG/ULOG target - like the TRACE patch or tcp
   window tracking.

   The patch is pending for 2.4 and 2.6 kernel inclusion. It
   currently has to patch iptables userspace aswell. If the
   userspace patch can be removed, we would be able to include it
   without any incompatibilities
     _________________________________________________________

2.5. proc patch

   - submit proc patch quickly
     _________________________________________________________

2.6. raw table

   Jozsef Kadlecsik has implemented the 'raw' table. This table
   registers at PREROUTING with a higher priority than connection
   tracking. This way a NOTRACK target can be used to exempt
   certain packets from being tracked by connection tracking. It
   also implements a TRACE target, that turns on a special bit in
   the packet. Later in the lifetime of the packet, any subsystem
   (currently just iptables) can print information about what
   happens to the packet (certain rule has matched, jumping to
   different chain, applying a NAT mapping, ...).

   This patch is supposed to be submitted immediately (after the
   logging patch, on which it depends)
     _________________________________________________________

2.7. TCP window tracking

   TCP windowtracking should be submitted to both 2.4.x and 2.6.x
   kernels. It should be switched on by default in 2.6.x, _off_
   in 2.4.x.

     * add tcp rfc793_compatible sysctl, what is default?
     * if OOW packet has broken checksum, don't print OOW message
       (2.6 only)
     * Add sysctl to control logging of OOW packets
     _________________________________________________________

2.8. nf-hipac

   nf-hipac is an excellent packet filter for medium to large
   sized rulesets. Almost all the functionality of iptables is
   now supported, without any semantic difference.

   The idea is to put the current 2.4.x patch into patch-o-matic.
   This should give nf-hipac more users. The userspace program
   remains a the hipac.org website - since they want to know
   about the approximate number of hipac users.

   The 2.6.x version could go in at any time - once it's
   implementation is finished. However, it is questionable if
   nf-hipac should be submitted before it's user interface is
   unified with pkttnetlink.

   nf-hipac will not replace {ip,pkt}tables, but be an additional
   option available to the user. nf-hipac is esp. not suitable
   for embedded environments, where kernel size and memory usage
   are very restricted.
     _________________________________________________________

2.9. bridgewalling

   Bart de Schuymer gave an excellent overview about the current
   bridging implementation, including ebtables and it's
   interaction with netfilter/iptables. The details can be found
   on the slides of Bart's presentation and/or other available
   documentation on the linux bridge implementation

   It is noteworthy that ebtables is not a copy+paste 'port' of
   iptables, but a reimplementation based on the spirit of
   iptables. It has several semantic changes/additions, e.g.
   'WATCHERS' which are basically a way to have multiple targets
   in one rule. Those extensions make it difficult to unify
   ebtables with pkttables. Since pkttables is still mostly
   vaporware, a discussion on integration with ebtables should be
   posponed until pkttables works for ip and ipv6.
     _________________________________________________________

2.10. Distributions of work within the project

   Harald notes that most of the administrative and maintainance
   work in the project has gradually accumulated as his job. As
   this is not a problem in itself, it however causes him to be
   able to spend less time on exciting new development than he
   wanted to. The proposal is to offload some of this
   administrative/maintainance work to other people in order to
   free up some time.

     * website maintainance
       The netfilter/iptables homepage is only maintained at the
       most minimal level possible. All Harald does is adding
       items to the 'News' sections and adding new releases.
       However, there are lots of ideas how the website could be
       made more attractive. All we lack is somebody actually
       doing this job.
          + marketing-style information, performance data
          + better link collection, database based
          + personal developer homepages
          + developer diaries
          + FAQ-o-matic system
       As nobody has been volunteering for this job, we will post
       a call for volunteer
     * security incident handling
       From the last couple of security incidents we've learned
       that we need somebody dedicated for the responsible job of
       dealing with security incidents. Somebody who can
       concentrate on this, and to whom this is not just item
       number 999 on his TODO list.
          + report all security relevant issues
          + coordinate release of advisories with vendors
          + keep advisories on homepage up-to-date
       Oskar Andreasson has volunteered for this job
     * mailinglist moderation
       All netfilter lists are set to subscriber-only. This means
       we will catch lots of SPAM at the administrative interface
       (since it is just Bcc'ed tothe list). Somebody should
       reliably check the admin interface at least once per day
       and take care of moderating the postings.
       As nobody has been volunteering for this job, we will post
       a call for volunteer
     * t-shirt shipping
       The netfilter t-shirts are printed at a printing company
       in southern germany. It would be very helpful to have
       somebody volunteering for the work of accepting payment
       (wire transfer / paypal) and shipping the individual
       T-Shirts via mail.
       Astaro has offered to see if they can somehow handle this
     * FAQ maintainance
       Once the new faq-o-matic system is in place, a moderator
       has to pick the most useful questions from the list of
       proposed questions, and write or include existing
       reference answers.
       As nobody has been volunteering for this job, we will post
       a call for volunteer
       Gert Hansen will work on the faq-o-matic system itself
     _________________________________________________________

2.11. test tools

   One of the biggest problems during netfilter/iptables
   development is the lack of test tools. Such tools are
   important for regression tests after changing the source code,
   but also necessarry for performance analysis.
     _________________________________________________________

2.11.1. regression testing

   In the past, the iptables testsuite (cvs 'testsuite'
   directory) was used to do some minimal regression tests.
   However, the testsuite has become out of date, and most of the
   new features of the last two years don't have corresponding
   tests in the testsuite.

   The future of the testsuite was discussed, but according to
   the authors perception, no clear concencus was established.
     _________________________________________________________

2.11.2. benchmarking

   In almost any other field of computing, there are standard
   benchmaks. TPC for databases, htperf for webservers, ...
   Packet filters do not have such a standard performance test.
   The closest to a standard performance test is 'number of
   forwarded packets at a given number of rules without dropping
   packets'. While this might be suitable to test the performance
   of a router, it is certainly not suitable for advanced packet
   filters like stateful firewalls are. Performance is dependent
   on so many parameters: number of new connections per
   timeframe, number of state changes within a connection,
   distribution of l3 and l4 source/destination addresses, ...

   In order to test, benchmark and improve performance of the
   whole linux packet filtering subsystem, we'd need some
   sophisticated banchmarking utility. Only with this
   benchmarking utility, we'd be able to compare performance data
   before and after a code change in a given set of test cases.

   Since this problem is not only faced by netfilter developers,
   but also by vendors like Astaro or Smoothwall - they are both
   willing to raise some funds for the development of a decent
   packet filter benchmarking tool.

   Since nothing within such a tool would be specific to
   netfilter, it could be used to benchmark any packet filter /
   router / ... - and thus to compare performance between
   different products / projects as well.
     _________________________________________________________

2.11.2.1. Harald's connection generator

   Harald proposed to write something he calls a 'connection
   generator'. The idea is to have a set of machines on both ends
   of the firewall running the connection generator software. The
   software would have two configuration files.

   One file specifies so-called connection profiles. A profile
   describes the amount of data sent in each direction, the
   typical duration of the connection, etc. The other file (the
   test case) specifies the number of connections of each
   profile, and the addresses between which they should be
   established: 100 'http' connections, from 192.168.100.0/24,
   random source to 10.1.2.3:80.

   The implementation would be as kernel threads, using the
   in-kernel sockets API in combination with the TCP zerocopy
   send path. This way we can avoid implementing our own TCP
   stack - but are bound to the limitations of the linux stack
   (in case of scalability due to overhead of sockets api, number
   of filedescriptors, ...).
     _________________________________________________________

2.11.2.2. Rusty's approach to the connection generator

   Rusty thinks we definitely need such a test tool (and we
   should aim it to become the industry standard test tool for
   firewalls) - but with a different architecture. As opposed to
   Haralds very primitive design, he thinks one should start the
   project from the different side: Analyzing real-world
   connections and classifying them into categories, deriving
   profiles of common properties of all connections within one
   class, etc. A connection profile would then resemble the exact
   timing characteristics of a connection. It would tell us at
   which sequence number one end was waiting for the reply, when
   packets had been dropped, how often retransmissions happened,
   selective acknowledgement was used, ...

   The counterpart would then be a replay program that could
   replay a connection according to a previously-sampled
   connection. However, it would be able to speed up the
   connection (by assuming a fixed latency but emulating higher
   bandwith, ...), and play back tens of thousands of connections
   at the same time.

   The two ends of a connection would always be terminated on the
   same machine (using two network interfaces). This way we can
   always be certain about which packets have arrived at what
   time, and don't need any out-of-band synchronization between
   sender and receiver. We'd also react realistically to
   increased latency introduced by the currently tested firewall.

   The implementation would not use the existing tcp/ip stack,
   but rather use a raw socket to gain full control over all
   timings.
     _________________________________________________________

2.11.2.3. Summary

   Since there is significant interest in such a project, and
   even two companies are interested in finacially supporting it,
   we should continue discussion on the design. At the end of
   this discussion, a design paper could be published, and a
   first implementation can be started. We should definitely try
   to get comments from the academic community, since there might
   be existing research in this field.
     _________________________________________________________

2.12. GPL Violations

   - GPL violation - current state - legal action?
     _________________________________________________________

2.13. patch-o-matic

   The coreteam decided case by case on the future of each
   patch-o-matic patch. The outcome is summarized by the
   following table

   57_ip_nat-macro-args.patch apply 2.4.23-pre / 2.6.x
   55_ipt_unclean-tcp-flag-table.patch apply 2.4.23-pre
   57_conntrack-tcp-nopickup.patch remove
   HL.patch.ipv6 stay
   REJECT.patch.ipv6 needs fix!
   fuzzy6.patch.ipv6 combine with nth+random
   nth6.patch.ipv6 combine
   random6.patch.ipv6 combine
   IPV4OPTSSTRIP.patch stay
   NETLINK.patch stay
   NETMAP.patch submit 2.6
   SAME.patch submit 2.6
   TTL.patch stay
   connlimit.patch needs auditing
   iprange.patch submit 2.6
   ipv4options.patch stay because implementation
   mport.patch stay, too trivial
   nth.patch combine
   pool.patch stay
   fuzzy.patch combine
   psd.patch submit 2.6, needs codingstyle change
   quota.patch stay
   random.patch combine
   realm.patch submit after kaber made indent fix
   u32.patch defer, skb-end/tail bug, ...
   condition6.patch stay
   ownercmd6.patch defer until ipv4 is compatible again
   CLASSIFY.patch submit, MODULE_AUTHOR missing
   CONNMARK.patch submit, CONFIG_IP_NF_CONNTRACK_MARK?
   IPMARK.patch stay
   ROUTE.patch stay
   addrtype.patch defer until kaber fixes
   condition.patch stay
   connbytes.patch stay, replaced by ctstat/netflow
   cuseeme-nat.patch defer, no nat support
   h323-conntrack-nat.patch move to broken
   ipt_TARPIT.patch defer, fix it to use raw table
   iptables-loopcheck-speedup.patch delete!!!
   mms-conntrack-nat.patch stay, no free server
   docbook-patch stay
   nfnetlink-ctnetlink-0.11.patch defer, lockup on SMP!!
   owner-socketlookup.patch submit, ask dave about exported
   symbol
   pptp-conntrack-nat.patch defer until bugs fixed
   quake3-conntrack.patch apply, no free client+server
   rpc-conntrack.patch stay, not enough users / testers
   rsh-conntrack.patch submit (overwrites name !!!)
   string.patch defer, faster implementation
   talk.patch submit if I find users
     _________________________________________________________

3. TODO list for 2.6.0-testX

   2.6.0 todo:

     * pom rewrite (rusty: python, C)
     * remove MIRROR from 2.6 kernel
     * remove unclean from 2.6 kernel
     * EXPERIMENTAL marks ???
     * local nat issues
     * add 'all safe netfilter modules' config option
     * push all compatibility-breaking stuff to dave
     * make ip_queue -> nf_queue (l3 independent)
     * submit 03-ipt_REJECT-bridgefix.patch