Fix evasion posibility by rejecting packets with a broken ACK field.
These packets have a non-0 ACK field, but do not have a ACK flag set.
Bug #3324.
Reported-by: Nicolas Adba
RST injection during the SYN_SENT state could trick Suricata into marking
a session as CLOSED. The way this was done is: using invalid TSECR value
in RST+ACK packet. The ACK was needed to force Linux into considering the
TSECR value and compare it to the TSVAL from the SYN packet.
The second works only against Windows. The client would not use a TSVAL
but the RST packet would. Windows will reject this, but Suricata considered
the RST valid and triggered the CLOSED logic.
This patch addresses both. When the SYN packet used timestamp support
the timestamp of incoming packet is validated. Otherwise, packet responding
should not have a timestamp.
Bug #3286
Reported-by: Nicolas Adba
Since the vlan.use-for-tracking setting is now handled in flow-hash.c,
we can fill in the vlan_id fields unconditionally. This makes the vlanh
fields unnecessary.
Related to https://redmine.openinfosecfoundation.org/issues/3076
This fixes redmine bug #2057 by setting pseudopacket iface and vlan from
flow values, solving the problem of missing vlan/iface when psuedopacket
gets logged/alerted on.
No longer set stream events after a gap or wrong thread. We know
we lost sync and are now in 'lets make the best of it'-mode. No
point in flooding the system with stream events.
Ticket #2484
Set event at most once per flow, for the first 'wrong' packet.
Add 'tcp.pkt_on_wrong_thread' counter. This is incremented for each
'wrong' packet. Note that the first packet for a flow determines
what thread is 'correct'.
Use this in places where we need to use the outer right
edge of our sequence space.
This way we can avoid walking the tree to find this, which
is a potentially expensive operation.
To improve worst case performance turn the segments list into a rbtree.
This greatly improves inserts, lookups and removals if the number of
segments gets very large.
The tree is sorted by the segment sequence number as its primary key.
If 2 segments have the same seq, the payload_len (segment length) is
used. Then the larger segment will be places after the smaller segment.
Exact matches are not added to the tree.
In case of a valid RST on a SYN, the state is switched to 'TCP_CLOSED'.
However, the target of the RST may not have received it, or may not
have accepted it. Also, the RST may have been injected, so the supposed
sender may not actually be aware of the RST that was sent in it's name.
In this case the previous behavior was to switch the state to CLOSED and
accept no further TCP updates or stream reassembly.
This patch changes this. It still switches the state to CLOSED, as this
is by far the most likely to be correct. However, it will reconsider
the state if the receiver continues to talk.
To do this on each state change the previous state will be recorded in
TcpSession::pstate. If a non-RST packet is received after a RST, this
TcpSession::pstate is used to try to continue the conversation.
If the (supposed) sender of the RST is also continueing the conversation
as normal, it's highly likely it didn't send the RST. In this case
a stream event is generated.
Ticket: #2501
Reported-By: Kirill Shipulin
When upgrading to TLS from HTTP logging of the final HTTP tx could
have the wrong direction. This was due to the original packet triggering/
finalizing the upgrade would be used as the base for both the toserver
and toclient pseudo packet meaning it was wrong in one direction.
This patch creates a pseudo packet in the same way as the flow timeout
code does, so it no longer takes the raw original packet in.
Bug #2430
This rule will match on the STREAM_3WHS_ACK_DATA_INJECT, that is
set if we're:
- in IPS mode
- get a data packet from the server
- that matches the exact SEQ/ACK expectations for the 3whs
The action of the rule is set to drop as the stream engine will drop.
So the rule action is actually not needed, but for consistency it
is drop.
If we have only seen the SYN and SYN/ACK of the 3whs, accept from
server data if it perfectly matches the SEQ/ACK expectations. This
might happen in 2 scenarios:
1. packet loss: if we lost the final ACK, we may get data that fits
this pattern (e.g. a SMTP EHLO message).
2. MOTS/MITM packet injection: an attacker can send a data packet
together with its SYN/ACK packet. The client due to timing almost
certainly gets the SYN/ACK before considering the data packet,
and will respond with the final ACK before processing the data
packet.
In IDS mode we will accept the data packet and rely on the reassembly
engine to warn us if the packet was indeed injected.
In IPS mode we will drop the packet. In the packet loss case we will
rely on retransmissions to get the session back up and running. For
the injection case we blocked this injection attempt.
The detect engine would bypass packets that are set as dropped. This
seems sane, as these packets are going to be dropped anyway.
However, it lead to the following corner case: stream events that
triggered the drop could not be matched on the rules. The packet
with the event wouldn't make it to the detect engine due to the bypass.
This patch changes the logic to not bypass DROP packets anymore.
Packets that are dropped by the stream engine will set the no payload
inspection flag, so avoid needless cost.
RAND_MAX is not guaranteed to be a divisor of ULONG_MAX, so take the
necessary precautions to get unbiased random numbers. Although the
bias might be negligible, it's not advisable to rely on it.
This adds new functions that will be called
through unix-socket and permit to update
and show memcap value.
The memcap value needs to be handled in a
thread safe way, so for this reason it is
declared as atomic var.
There are several NULL-pointer derefs in StreamTCPInitConfig. All of them happen because ConfGet returns 1 even if the value is NULL(due to misconfiguration for example).
This commit introduces a new function "ConfGetValue". It adds return values for NULL-pointer to ConfGet and could be used as a replacement for ConfGet.
Note: Simply modify ConfGet might not be a good idea, because there are some places where ConfGet should return 1 even if "value" is NULL. For example if ConfGet should get a Config-Leave in the yaml-hierarchy.
Bug: 2354
The older random functions returned random values in the range of
0 - RAND_MAX. This is what the http randomize code was expecting.
Newer methods, based on getrandom (or probably Windows too), return
a much large range of values, including negative values and >RAND_MAX.
This patch adds a wrapper to turn the returned value into the expected
range before using it in the http code.
The same is true for the stream engine.
With large number of threads the default memcaps lead to pool setup
failures. Make sure these are reported properly so that the user
knows what is going on.
Bug: #2226
The reason the stream engine can't easily decide to bypass streams
is that there can be non-stream dependent rules that wouldn't match
if bypassing is done too aggressively.
However, if there is no detection engine, there is no reason to hold
back. In this case we can bypass as soon as the stream engine is done
with a session.
Observed:
STARTTLS creates 2 pseudo packets which are tied to a real packet.
TPR (tunnel packet ref) counter increased to 2.
Pseudo 1: goes through 'verdict', increments 'ready to verdict' to 1.
Packet pool return code frees this packet and decrements TPR in root
to 1. RTV counter not changed. So both are now 1.
Pseudo 2: verdict code sees RTV == TPR, so verdict is set based on
pseudo packet. This is too soon. Packet pool return code frees this
packet and decrements TPR in root to 0.
Real packet: TRP is 0 so set verdict on this packet. As verdict was
already set, NFQ reports an issue.
The decrementing of TPR doesn't seem to make sense as RTV is not
updated.
Solution:
This patch refactors the ref count and verdict count logic. The beef
is now handled in the generic function TmqhOutputPacketpool(). NFQ
and IPFW call a utility function VerdictTunnelPacket to see if they
need to verdict a packet.
Remove some unused macro's for managing these counters.
TCP reassembly is now deactivated more frequently and triggering a
bypass on it is resulting in missing some alerts due forgetting
about packet based signature.
So this patch is introducing a dedicated flag that can be set in
the app layer and transmitted in the streaming to trigger bypass.
It is currently used by the SSL app layer to trigger bypass when
the stream becomes encrypted.
Suricata was inconditionaly dropping packets that are invalid with
respect to the streaming engine. In some corner case like asymetric
trafic capture, this was leading to dropping some legitimate trafic.
The async-oneside option did help but this was not perfect in some
real life case. So this patch introduces an option that allow the
user to tell Suricata not to drop packet that are invalid with
respect to streaming.
Initialize midstream with async if enabled. Unset async on seeing
bidirectional traffic.
If only async-oneside is enabled, set ASYNC flag on session creation
when receiving a SYN packet.
Let last_ack stay in sync with next_seq so that various checks work
better.
When switching protocol from http to tls the following corner case
was observed:
pkt 6, TC "200 connection established"
pkt 7, TS acks pkt 6 + adds "client hello"
pkt 8 TC, acks pkt 7
pkt 8 is where normally the detect on the 200 connection established
would run however before detection runs the app-layer is called
and it resets the state
So the issue is missed detection on the last data in the original
protocol before the switch.
Another case was:
TS -> STARTTLS
TC -> Ack "STARTTLS data"
220
TS -> Ack "220 data"
Client Hello
In IDS mode, this made a rule that wanted to look at content:"STARTTLS"
in combination with the protocol SMTP 'alert smtp ... content:"STARTTLS";'
impossible. By the time the content would match, the protocol was already
switched.
This patch fixes this case by creating a 'Detect/Log Flush' packet in
both directions. This will force final inspection and logging of the
pre-upgrade protocol (SMTP in this example) before doing the final
switch.
Set flags by default:
-Wmissing-prototypes
-Wmissing-declarations
-Wstrict-prototypes
-Wwrite-strings
-Wcast-align
-Wbad-function-cast
-Wformat-security
-Wno-format-nonliteral
-Wmissing-format-attribute
-funsigned-char
Fix minor compiler warnings for these new flags on gcc and clang.
Instead of killing all reassembly instantly do things slightly more
gracefully:
1. disable app-layer reassembly immediately
2. flag raw reassembly not to accept new data
This will allow the current data to be inspected still.
After detect as run the raw reassembly will be fully disabled and
thus all reassembly will be as well.
Now that detect moves the raw progress forward, it's important
to deal with the case where detect don't consider raw inspection.
If no 'stream' rules are active, disable raw. For this the disable
raw flag is now per stream.
At flow timeout, we no longer need to first run reassembly in
one dir, then inspection in the other. We can do both in single
packet now.
Disable pseudo packets when receiving stream end packets. Instead
call the app-layer parser in the packet direction for stream end
packets and flow end packets.
These changes in handling of those stream end packets make the
pseudo packets unnecessary.
Remove the 'StreamMsg' approach from the engine. In this approach the
stream engine would create a list of chunks for inspection by the
detection engine. There were several issues:
1. the messages had a fixed size, so blocks of data bigger than ~4k
would be cut into multiple messages
2. it lead to lots of data copying and unnecessary memory use
3. the StreamMsgs used a central pool
The Stream engine switched over to the streaming buffer API, which
means that the reassembled data is always available. This made the
StreamMsg approach even clunkier.
The new approach exposes the streaming buffer data to the detection
engine. It has to pay attention to an important issue though: packet
loss. The data may have gaps. The streaming buffer API tracks the
blocks of continuous data.
To access the data for inspection a callback approach is used. The
'StreamReassembleRaw' function is called with a callback and data.
This way it runs the MPM and individual rule inspection code. At
the end of each detection run the stream engine is notified that it
can move forward it's 'progress'.
Make stream engine use the streaming buffer API for it's data storage.
This means that the data is stored in a single reassembled sliding
buffer. The subleties of the reassembly, e.g. overlap handling, are
taken care of at segment insertion.
The TcpSegments now have a StreamingBufferSegment that contains an
offset and a length. Using this the segment data can be retrieved
per segment.
Redo segment insertion. The insertion code is moved to it's own file
and is simplified a lot.
A major difference with the previous implementation is that the segment
list now contains overlapping segments if the traffic is that way.
Previously there could be more and smaller segments in the memory list
than what was seen on the wire.
Due to the matching of in memory segments and on the wire segments,
the overlap with different data detection (potential mots attacks)
is much more accurate.
Raw and App reassembly progress is no longer tracked per segment using
flags, but there is now a progress tracker in the TcpStream for each.
When pruning we make sure we don't slide beyond in-use segments. When
both app-layer and raw inspection are beyond the start of the segment
list, the segments might not be freed even though the data in the
streaming buffer is already gone. This is caused by the 'in-use' status
that the segments can implicitly have. This patch accounts for that
when calculating the 'left_edge' of the streaming window.
Raw reassembly still sets up 'StreamMsg' objects for content
inspection. They are set up based on either the full StreamingBuffer,
or based on the StreamingBufferBlocks if there are gaps in the data.
Reworked 'stream needs work' logic. When a flow times out the flow
engine checks whether a TCP flow still needs work. The
StreamNeedsReassembly function is used to test if a stream still has
unreassembled segments or uninspected stream chunks.
This patch updates the function to consider the app and/or raw
progress. It also cleans the function up and adds more meaningful
debug messages. Finally it makes it non-inline.
Unittests have been overhauled, and partly moved into their own files.
Remove lots of dead code.
Issue:
https://redmine.openinfosecfoundation.org/issues/2041
One approach to fixing this issue to just validate the
checksum instead of regenerating it and comparing it. This
method is used in some kernels and other network tools.
When validating, the current checksum is passed in as an
initial argument which will cause the final checksum to be 0
if OK. If generating a checksum, 0 is passed and the result
is the generated checksum.
Suricata should not completely bypass a flow before both end of it
have reached the stream depth or have reached a certain state.
Justification is that suricata need the ACK to treat the other side
so we can't really decide to cut only one side.
This patch activates bypass for encrypted flow and for flow
that have reached stream depth on both side.
For encrypted flow , suricata is stopping the inspection so
we can just get it out via bypass. The same logic apply
for flow that have reached the stream depth.
For a basic test of feature, use the following ruleset:
```
table ip filter {
chain output {
type filter hook output priority 0; policy accept;
ct mark 0x1 counter accept
oif lo counter queue num 0
}
chain connmark_save {
type filter hook output priority 1; policy accept;
mark 0x1 ct mark set mark counter
ct mark 0x1 counter
}
}
```
And use bypass mark and mask of 1 in nfq configuration. Then you
can test the system by scp big file to 127.0.0.1. You can also
use iperf to measure the performance on localhost. It is recommended
to lower the MTU to 1500 to get something more realistic by increasing
the number of packets..
Until now the flow manager would walk the entire flow hash table on an
interval. It would thus touch all flows, leading to a lot of memory
and cache pressure. In scenario's where the number of tracked flows run
into the hundreds on thousands, and the memory used can run into many
hundreds of megabytes or even gigabytes, this would lead to serious
performance degradation.
This patch introduces a new approach. A timestamp per flow bucket
(hash row) is maintained by the flow manager. It holds the timestamp
of the earliest possible timeout of a flow in the list. The hash walk
skips rows with timestamps beyond the current time.
As the timestamp depends on the flows in the hash row's list, and on
the 'state' of each flow in the list, any addition of a flow or
changing of a flow's state invalidates the timestamp. The flow manager
then has to walk the list again to set a new timestamp.
A utility function FlowUpdateState is introduced to change Flow states,
taking care of the bucket timestamp invalidation while at it.
Empty flow buckets use a special value so that we don't have to take
the flow bucket lock to find out the bucket is empty.
This patch also adds more performance counters:
flow_mgr.flows_checked | Total | 929
flow_mgr.flows_notimeout | Total | 391
flow_mgr.flows_timeout | Total | 538
flow_mgr.flows_removed | Total | 277
flow_mgr.flows_timeout_inuse | Total | 261
flow_mgr.rows_checked | Total | 1000000
flow_mgr.rows_skipped | Total | 998835
flow_mgr.rows_empty | Total | 290
flow_mgr.rows_maxlen | Total | 2
flow_mgr.flows_checked: number of flows checked for timeout in the
last pass
flow_mgr.flows_notimeout: number of flows out of flow_mgr.flows_checked
that didn't time out
flow_mgr.flows_timeout: number of out of flow_mgr.flows_checked that
did reach the time out
flow_mgr.flows_removed: number of flows out of flow_mgr.flows_timeout
that were really removed
flow_mgr.flows_timeout_inuse: number of flows out of flow_mgr.flows_timeout
that were still in use or needed work
flow_mgr.rows_checked: hash table rows checked
flow_mgr.rows_skipped: hash table rows skipped because non of the flows
would time out anyway
The counters below are only relating to rows that were not skipped.
flow_mgr.rows_empty: empty hash rows
flow_mgr.rows_maxlen: max number of flows per hash row. Best to keep low,
so increase hash-size if needed.
flow_mgr.rows_busy: row skipped because it was locked by another thread
Now that the FlowWorker handles the TCP Stream directly, having
the TCP engine as a thread module is no longer needed.
This patch removes the registration.
Initial version of the 'FlowWorker' thread module. This module
combines Flow handling, TCP handling, App layer handling and
Detection in a single module. It does all flow related processing
under a single flow lock.
When we run on live traffic, time handling is simple. Packets have a
timestamp set by the capture method. Management threads can simply
use 'gettimeofday' to know the current time. There should never be
any serious gap between the two or major differnces between the
threads.
In offline mode, things are dramatically different. Here we try to keep
the time from the pcap, which means that if the packets are recorded in
2011 the log output should also reflect this. Multiple issues:
1. merged pcaps might have huge time jumps or time going backward
2. slowly recorded pcaps may be processed much faster than their
'realtime'
3. management threads need a concept of what the 'current' time is for
enforcing timeouts
4. due to (1) individual threads may have very different views on what
the current time is. E.g. T1 processed packet 1 with TS X, while T2
at the very same time processes packet 2 with TS X+100000s.
The changes in flow handling make the problems worse. The capture thread
no longer handles the flow lookup, while it did set the global 'time'.
This meant that a thread may be working on Packet 1 with TS 1, while the
capture thread already saw packet 2 with TS 10000. Management threads
would take TS 10000 as the 'current time', considering a flow created by
the first thread as timed out immediately.
This was less of a problem before the flow changes as the capture thread
would also create a flow reference for a packet, meaning the flow
couldn't time out as easily. Packets in the queues between capture
thread and workers would all hold such references.
The patch updates the time handling to be as follows.
In offline mode we keep the timestamp per thread. If a management thread
needs current time, it will get the minimum of the threads' values. This
is to avoid the problem that T2s time value might already trigger a flow
timeout as the flow lastts + 100000s is almost certainly meaning the
flow would be considered timed out.
Instead of handling the packet update during flow lookup, handle
it in the stream/detect threads. This lowers the load of the
capture thread(s) in autofp mode.
The decoders now set a flag in the packet if the packet needs a
flow lookup. Then the workers will take care of this. The decoders
also already calculate the raw flow hash value. This is so that
this value can be used in flow balancing in autofp.
Because the flow lookup/creation is now done in the worker threads,
the flow balancing can no longer use the flow. It's not yet
available. Autofp load balancing uses raw hash values instead.
In the same line, move UDP AppLayer out of the DecodeUDP module,
and also into the stream/detect threads.
Handle TCP session reuse inside the flow engine itself. If a looked up
flow matches the packet, but is a TCP stream starter, check if the
ssn needs to be reused. If that is the case handle it within the
lookup function. Simplies the locking and removes potential race
conditions.
Update Flow lookup functions to get a flow reference during lookup.
This reference is set under the FlowBucket lock.
This paves the way to not getting a flow lock during lookups.
Until now, the TCP options would all be stored in the Packet structure.
The commonly used ones (wscale, ts, sack, sackok and mss*) then had a
pointer to the position in the option array. Overall this option array
was large. About 360 bytes on 64bit systems. Since no part of the engine
would every access this array other than through the common short cuts,
this was actually just wasteful.
This patch changes the approach. It stores just the common ones in the
packet. The rest is gone. This shrinks the packet structure with almost
300 bytes.
* even though mss wasn't actually used
If stream.inline setting was missing it would default to IDS.
This patch changes the default to 'auto', meaning that in IPS mode
the stream engine also uses IPS mode and in IDS mode it's still in
IDS mode.
Bug #1570
StreamTcpSegmentForEach would only return ACK'd segments. This lead
to missing stream data in alerts when running in IPS mode.
This patch changes the behavior for IPS. All segments are iterated
now, also the non-ACK'd ones. For IDS mode the behavior is unchanged.
Store the tenant id in the flow and use the stored id when setting
up pesudo packets.
For tunnel and defrag packets, get tenant from parent. This will only
pass tenant_id's set at capture time.
For defrag packets, the tenant selector based on vlan id will still
work as the vlan id(s) are stored in the defrag tracker before being
passed on.
Set noinspection flags for payloads and packets on flow and stream
pseudo packets. Without these, the pseudo packets could trigger
inspection even though this was disabled for a flow.