To simplify locking, move all locking out of the individual detect
code. Instead at the start of detection lock the flow, and at the
end of detection unlock it.
The lua code can be called without a lock still (from the output
code paths), so still pass around a lock hint to take care of this.
Instead of handling the packet update during flow lookup, handle
it in the stream/detect threads. This lowers the load of the
capture thread(s) in autofp mode.
The decoders now set a flag in the packet if the packet needs a
flow lookup. Then the workers will take care of this. The decoders
also already calculate the raw flow hash value. This is so that
this value can be used in flow balancing in autofp.
Because the flow lookup/creation is now done in the worker threads,
the flow balancing can no longer use the flow. It's not yet
available. Autofp load balancing uses raw hash values instead.
In the same line, move UDP AppLayer out of the DecodeUDP module,
and also into the stream/detect threads.
Handle TCP session reuse inside the flow engine itself. If a looked up
flow matches the packet, but is a TCP stream starter, check if the
ssn needs to be reused. If that is the case handle it within the
lookup function. Simplies the locking and removes potential race
conditions.
Update Flow lookup functions to get a flow reference during lookup.
This reference is set under the FlowBucket lock.
This paves the way to not getting a flow lock during lookups.
Stream GAPs and stream reassembly depth are tracked per direction. In
many cases they will happen in one direction, but not in the other.
Example:
HTTP requests a generally smaller than responses. So on the response
side we may hit the depth limit, but not on the request side.
The asynchronious 'disruption' has a side effect in the transaction
engine. The 'progress' tracking would never mark such transactions
as complete, and thus some inspection and logging wouldn't happen
until the very last moment: when EOF's are passed around.
Especially in proxy environments with _very_ many transactions in a
single TCP connection, this could lead to serious resource issues. The
EOF handling would suddenly have to handle thousands or more
transactions. These transactions would have been stored for a long time.
This patch introduces the concept of disruption flags. Flags passed to
the tx progress logic that are and indication of disruptions in the
traffic or the traffic handling. The idea is that the progress is
marked as complete on disruption, even if a tx is not complete. This
allows the detection and logging engines to process the tx after which
it can be cleaned up.
In the lastts timeval struct field in the flow the timestamp of the
last packet to update is recorded. This allows for tracking the timeout
of the flow. So far, this value was updated under the flow lock and also
read under the flow lock.
This patch moves the updating of this field to the FlowGetFlowFromHash
function, where it updated at the point where both the Flow and the
Flow Hash Row lock are held. This guarantees that the field is only
updated when both locks are held.
This makes reading the field safe when either lock is held, which is the
purpose of this patch.
The flow manager, while holding the flow hash row lock, can now safely
read the lastts value. This allows it to do the flow timeout check
without actually locking the flow.
A flow has 3 states: NEW, ESTABLISHED and CLOSED.
For all protocols except TCP, a flow is in state NEW as long as just one
side of the conversation has been seen. When both sides have been
observed the state is moved to ESTABLISHED.
TCP has a different logic, controlled by the stream engine. Here the TCP
state is leading.
Until now, when parts of the engine needed to know the flow state, it
would invoke a per protocol callback 'GetProtoState'. For TCP this would
return the state based on the TcpSession.
This patch changes this logic. It introduces an atomic variable in the
flow 'flow_state'. It defaults to NEW and is set to ESTABLISHED for non-
TCP protocols when we've seen both sides of the conversation.
For TCP, the state is updated from the TCP engine directly.
The goal is to allow for access to the state without holding the Flow's
main mutex lock. This will later allow the Flow Manager(s) to evaluate
the Flow w/o interupting it.
Most flows are marked for clean up by the flow manager, which then
passes them to the recycler. The recycler logs and cleans up. However,
under resource stress conditions, the packet threads can recycle
existing flow directly. So here the recycler has no role to play, as
the flow is immediately used.
For this reason, the packet threads need to be able to invoke the
flow logger directly.
The flow logging thread ctx will stored in the DecodeThreadVars
stucture. Therefore, this patch makes the DecodeThreadVars an argument
to FlowHandlePacket.
app-layer.[ch], app-layer-detect-proto.[ch] and app-layer-parser.[ch].
Things addressed in this commit:
- Brings out a proper separation between protocol detection phase and the
parser phase.
- The dns app layer now is registered such that we don't use "dnstcp" and
"dnsudp" in the rules. A user who previously wrote a rule like this -
"alert dnstcp....." or
"alert dnsudp....."
would now have to use,
alert dns (ipproto:tcp;) or
alert udp (app-layer-protocol:dns;) or
alert ip (ipproto:udp; app-layer-protocol:dns;)
The same rules extend to other another such protocol, dcerpc.
- The app layer parser api now takes in the ipproto while registering
callbacks.
- The app inspection/detection engine also takes an ipproto.
- All app layer parser functions now take direction as STREAM_TOSERVER or
STREAM_TOCLIENT, as opposed to 0 or 1, which was taken by some of the
functions.
- FlowInitialize() and FlowRecycle() now resets proto to 0. This is
needed by unittests, which would try to clean the flow, and that would
call the api, AppLayerParserCleanupParserState(), which would try to
clean the app state, but the app layer now needs an ipproto to figure
out which api to internally call to clean the state, and if the ipproto
is 0, it would return without trying to clean the state.
- A lot of unittests are now updated where if they are using a flow and
they need to use the app layer, we would set a flow ipproto.
- The "app-layer" section in the yaml conf has also been updated as well.
By moving FlowReference() out of FlowGetFlowFromHash() and into the one
function that calls it, all the flow functions take const Packet * instead
of Packet *.
Tilera's GCC supports the GCC __sync_ intrinsics.
Increase the size of some atomic variables for better performance on
Tile. The Tile-Gx architecture has native support for 32-bit and
64-bit atomic operations, but not 8-bit and 16-bit, which are emulated
using 32-bit atomics, so changing some 16-bit and 8-bit atomic into
ints improves performance.
Increasing the size of the atomic variables modified in this change
does not increase the total size of the structures in which they
reside because of existing padding requirements. The one case that
would increase the size of the structure (Flow_) was confitionalized
to only change the size on Tile.
When handling error case on SCMallog, SCCalloc or SCStrdup
we are in an unlikely case. This patch adds the unlikely()
expression to indicate this to gcc.
This patch has been obtained via coccinelle. The transformation
is the following:
@istested@
identifier x;
statement S1;
identifier func =~ "(SCMalloc|SCStrdup|SCCalloc)";
@@
x = func(...)
... when != x
- if (x == NULL) S1
+ if (unlikely(x == NULL)) S1
clang was issuing some warnings related to unused return in function.
This patch adds some needed error treatment and ignore the rest of the
warnings by adding a cast to void.
Major redesign of the flow engine. Remove the flow queues that turned
out to be major choke points when using many threads. Flow manager now
walks the hash table directly. Simplify the way we get a new flow in
case of emergency.