Goals:
- reduce locking
- take advantage of 'hot' caches
- better locality
Locking reduction
New flow spare pool. The global pool is implmented as a list of blocks,
where each block has a 100 spare flows. Worker threads fetch a block at
a time, storing the block in the local thread storage.
Flow Recycler now returns flows to the pool is blocks as well.
Flow Recycler fetches all flows to be processed in one step instead of
one at a time.
Cache 'hot'ness
Worker threads now check the timeout of flows they evaluate during lookup.
The worker will have to read the flow into cache anyway, so the added
overhead of checking the timeout value is minimal. When a flow is considered
timed out, one of 2 things happens:
- if the flow is 'owned' by the thread it is handled locally. Handling means
checking if the flow needs 'timeout' work.
- otherwise, the flow is added to a special 'evicted' list in the flow
bucket where it will be picked up by the flow manager.
Flow Manager timing
By default the flow manager now tries to do passes of the flow hash in
smaller steps, where the goal is to do full pass in 8 x the lowest timeout
value it has to enforce. So if the lowest timeout value is 30s, a full pass
will take 4 minutes. The goal here is to reduce locking overhead and not
get in the way of the workers.
In emergency mode each pass is full, and lower timeouts are used.
Timing of the flow manager is also no longer relying on pthread condition
variables, as these generally cause waking up much quicker than the desired
timout. Instead a simple (u)sleep loop is used.
Both changes reduce the number of hash passes a lot.
Emergency behavior
In emergency mode there a number of changes to the workers. In this scenario
the flow memcap is fully used up and it is unavoidable that some flows won't
be tracked.
1. flow spare pool fetches are reduced to once a second. This avoids locking
overhead, while the chance of success was very low.
2. getting an active flow directly from the hash skips flows that had very
recent activity to avoid the scenario where all flows get only into the
NEW state before getting reused. Rather allow some to have a chance of
completing.
3. TCP packets that are not SYN packets will not get a used flow, unless
stream.midstream is enabled. The goal here is again to avoid evicting
active flows unnecessarily.
Better Localily
Flow Manager injects flows into the worker threads now, instead of one or
two packets. Advantage of this is that the worker threads can get packets
from their local packet pools, avoiding constant overhead of packets returning
to 'foreign' pools.
Counters
A lot of flow counters have been added and some have been renamed.
Overall the worker threads increment 'flow.wrk.*' counters, while the flow
manager increments 'flow.mgr.*'.
Additionally, none of the counters are snapshots anymore, they all increment
over time. The flow.memuse and flow.spare counters are exceptions.
Misc
FlowQueue has been split into a FlowQueuePrivate (unlocked) and FlowQueue.
Flow no longer has 'prev' pointers and used a unified 'next' pointer for
both hash and queue use.
Expose UTH flow builder to new 'FUZZ' define as well. Move UTHbufferToFile
as well and rename it to a more generic 'TestHelperBufferToFile'.
This way UNITTESTS can be disabled. This leads to smaller code size
and more realistic testing as in some parts of the code things
behave slightly differently when UNITTESTS are enabled.
Previously each 'TmSlot' had it's own packet queue that was passed
to the registered SlotFunc as an argument. This was used mostly for
tunnel packets by the decoders and by defrag.
This patch removes that in favor of a single queue in the ThreadVars:
decode_pq. This is the non-locked version of the queue as this is
only a temporary store for handling packets within a thread.
This patch removes the PacketQueue pointer argument from the API.
The new queue can be accessed directly through the ThreadVars
pointer.
On MinGW the result of ntohl needs to be casted to uint32_t and
the result of ntohs to uint16_t. To avoid doing this everywhere
add SCNtohl and SCNtohs macros.
Set flags by default:
-Wmissing-prototypes
-Wmissing-declarations
-Wstrict-prototypes
-Wwrite-strings
-Wcast-align
-Wbad-function-cast
-Wformat-security
-Wno-format-nonliteral
-Wmissing-format-attribute
-funsigned-char
Fix minor compiler warnings for these new flags on gcc and clang.
When we run on live traffic, time handling is simple. Packets have a
timestamp set by the capture method. Management threads can simply
use 'gettimeofday' to know the current time. There should never be
any serious gap between the two or major differnces between the
threads.
In offline mode, things are dramatically different. Here we try to keep
the time from the pcap, which means that if the packets are recorded in
2011 the log output should also reflect this. Multiple issues:
1. merged pcaps might have huge time jumps or time going backward
2. slowly recorded pcaps may be processed much faster than their
'realtime'
3. management threads need a concept of what the 'current' time is for
enforcing timeouts
4. due to (1) individual threads may have very different views on what
the current time is. E.g. T1 processed packet 1 with TS X, while T2
at the very same time processes packet 2 with TS X+100000s.
The changes in flow handling make the problems worse. The capture thread
no longer handles the flow lookup, while it did set the global 'time'.
This meant that a thread may be working on Packet 1 with TS 1, while the
capture thread already saw packet 2 with TS 10000. Management threads
would take TS 10000 as the 'current time', considering a flow created by
the first thread as timed out immediately.
This was less of a problem before the flow changes as the capture thread
would also create a flow reference for a packet, meaning the flow
couldn't time out as easily. Packets in the queues between capture
thread and workers would all hold such references.
The patch updates the time handling to be as follows.
In offline mode we keep the timestamp per thread. If a management thread
needs current time, it will get the minimum of the threads' values. This
is to avoid the problem that T2s time value might already trigger a flow
timeout as the flow lastts + 100000s is almost certainly meaning the
flow would be considered timed out.
Instead of handling the packet update during flow lookup, handle
it in the stream/detect threads. This lowers the load of the
capture thread(s) in autofp mode.
The decoders now set a flag in the packet if the packet needs a
flow lookup. Then the workers will take care of this. The decoders
also already calculate the raw flow hash value. This is so that
this value can be used in flow balancing in autofp.
Because the flow lookup/creation is now done in the worker threads,
the flow balancing can no longer use the flow. It's not yet
available. Autofp load balancing uses raw hash values instead.
In the same line, move UDP AppLayer out of the DecodeUDP module,
and also into the stream/detect threads.
Handle TCP session reuse inside the flow engine itself. If a looked up
flow matches the packet, but is a TCP stream starter, check if the
ssn needs to be reused. If that is the case handle it within the
lookup function. Simplies the locking and removes potential race
conditions.
Most flows are marked for clean up by the flow manager, which then
passes them to the recycler. The recycler logs and cleans up. However,
under resource stress conditions, the packet threads can recycle
existing flow directly. So here the recycler has no role to play, as
the flow is immediately used.
For this reason, the packet threads need to be able to invoke the
flow logger directly.
The flow logging thread ctx will stored in the DecodeThreadVars
stucture. Therefore, this patch makes the DecodeThreadVars an argument
to FlowHandlePacket.
sigs(priority wise) inside staging.
Previously we would assign signums before sig ordering, and hence the
order didn't actually reflect the order of the sig in the
sig_list(assuming sig reordering changed the sig_list). Staging would
use the old sig_nums to decide the priority of sigs.
2. Fix sig ordering for flowvar, flowbits, flowint, pktvar sigs. We have
introduced a new priority to treat sigs with set + read as lower
priority compared to set only sigs.
3. Previously we treated sigs with a "priority(keyword)" > another sig's
priority, as a sig with greater priority than the later. We have
reversed it. Now the sig priority ordering is 1,2,.etc. Updated
sigordering unittests to reflect the same.
When handling error case on SCMallog, SCCalloc or SCStrdup
we are in an unlikely case. This patch adds the unlikely()
expression to indicate this to gcc.
This patch has been obtained via coccinelle. The transformation
is the following:
@istested@
identifier x;
statement S1;
identifier func =~ "(SCMalloc|SCStrdup|SCCalloc)";
@@
x = func(...)
... when != x
- if (x == NULL) S1
+ if (unlikely(x == NULL)) S1
In stateful detection only inspect the file portion of the rule after all
other conditions matched. This to prevent "filestore" from tagging files
for storage during a partial match.
Add a couple of unittests to test the behaviour change.
Introduce a separate FlowAddress structure for holding the ipv4 or ipv6 address
that doesn't have the family in it like the Address structure. Instead, the
family is stored in the flow as a flag: FLOW_IPV4 and FLOW_IPV6.
Add macro's to check the family, copy the address, etc.
Update many unittests to reflect these changes. Introduce unittest helper
functions for creating and initializing a flow and freeing it again.
On 64 bit this shrinks the flow with 8 bytes.
This patch fixes compilation on OpenBSD platform. It is running
fine on a pcap file. The patch should also fix compilation on
WIN32 platform but this is not tested.
This patch fixes Packet initialisation. In some place the pkt field
was not set after a memset used to zero the structure and this could
lead to some problems.
This patch implements the needed modification of payload access
in a Packet structure to support the abstraction introduced by
the extended data system.