Create traits for app-layer State and Transaction that allow
a generic implementation of a transaction iterator that parser
can use when the follow the common pattern for iterating
transactions.
Also convert DNS to use the generic for testing purposes.
Rules profiling was returning invalid results when used with sample
rate. The problem was that the sample condition was run twice in the
packet flow. As a result, the second pass was not initializing the
variable storing the initial CPU ticks and the resulting performance
counters were reporting invalid values.
Bug: #4836.
Ticket: #4569
If a FIN+SYN packet is sent, the destination may keep the
connection alive instead of starting to close it.
In this case, a later SYN packet will be ignored by the
destination.
Previously, Suricata considered this a session reuse, and thus
used the sequence number of the last SYN packet, instead of
using the one of the live connection, leading to evasion.
This commit errors on FIN+SYN so that they do not get
processed as regular FIN packets.
Special handling for RST packets if they have an TCP MD5 or AO header option.
The options hash can't be validated. The end host might be able to validate
it, as it can have a key/password that was communicated out of band.
The sender could use this to move the TCP state to 'CLOSED', leading to
a desync of the TCP session.
This patch builds on top of
843d0b7a10 ("stream: support RST getting lost/ignored")
It flags the receiver as having received an RST and moves the TCP state
into the CLOSED state. It then reverts this if the sender continues to
send traffic. In this case it sets the following event:
stream-event:suspected_rst_inject;
Bug: #4710.
Ticket: #4562
As the data which triggered the opposing side
was the same protocol and not another one,
that means the protocol change failed.
Prevents a memory leak in later call of AppLayerParserParse
which would allocate a new state and leak the old one
When InspectionBufferGet gets called with base_id
Later InspectionBufferSetup must also be called with base_id
In case there were transforms, we had base_id != list_id
Not calling InspectionBufferSetup with the right id
resulted in leaving a dangling pointer,
because it was not added to det_ctx->inspect.to_clear_queue
Bug: #4681.
Flows have been shown to linger for a long time w/o giving up their
resources. This would lead to higher memory use and memcaps getting
reached.
Three main causes have been identified:
Slow passes hash passes. By default the flow manager will scan the
flow hash slowly. It is based on the flow timeout settings, and with
the default config it will take 4 minutes for a full scan to be
complete. This leaves a window for flows that are timed out to linger
for minutes longer than expected.
Flow Manager yields under pressure. The per row TryLock causes work
to be delayed more. The Flow manager will use trylock on a hash row
and will yield immediately if the row is busy. This means that it will
take a full pass before the row is revisited again. If the row holds
busy flows, this could happen many times in a row.
Flow Manager favors evicted flows over active flows. The Flow Manager
will only process the evicted flows if they are present. These flows
have been evicted by workers. The active flows on that hash row will
have to wait until the next hash pass. Of course by then there could
be more evicted flows.
Combined these factors could lead to flows not being considered for
freeing and logging for a very long time, potentially even indefinitly.
The patch addresses the latter two flow manager issues by no longer
using TryLock. It will now simply wait for the lock to be released and
then do its work on it. Additionally for each row both the evicted list
and the active flow list will be processed.
Bug: #4650.
PacketSetData() can't fail unless the input pointer is NULL, which is
impossible from the af-packet paths calling it. Remove error check to
avoid possible branching.
The AFPSwitchState function would close the socket and free the
other resources when the interface went down _and_ the ref cnt was
0. However in autofp mode it was common to get to this point while
packets were still processed in the autofp worker threads, meaning
the ref cnt would not be 0. On the interface coming back up the
initialization code would overwrite the socket and rings, leading
to resource leaks.
Socket ref cnt is decremented from the v2 release callback. If the
callback would get to ref cnt 0, the packet would not be released
in the kernel, but it would (possibly) close the socket if the
iface was down, but not free other resources.
This patch changes the logic to first release the packet to the
kernel and then decrement the ref cnt and it makes the main receive
loop the only one responsible for opening and closing sockets. Wait
with closing the socket and rings until the ref count is 0, which can
happen after AFPSwitchState is called due to packets still being
processed by autofp worker threads.
Bug: #4803.
Tpacket v3 only supports workers mode, which means the packet that would
reference a socket won't leave the thread. Therefore keeping a ref count
on the socket is not needed.
This patch removes the per packet reference count increment. The decrement
was missing, so this fixes the ref cnt handling so that after a iface up/
down capture can recover.
It should also lead to a minor performance increase as we avoid a round
of atomic operations per packet.
Bug: #4804.
Bug: #4801.
The Suricata AF_PACKET code opens a socket per thread, then after some minor
setup enters a loop where the socket is poll()'d with a timeout. When the
poll() call returns a non zero positive value, the AF_PACKET ring will be
processed.
The ringbuffer processing logic has a pointer into the ring where we last
checked the ring. From this position we will inspect each frame until we
find a frame with tp_status == TP_STATUS_KERNEL (so essentially 0). This
means the frame is currently owned by the kernel.
There is a special case handling for starting the ring processing but
finding a TP_STATUS_KERNEL immediately. This logic then skip to the next
frame, rerun the check, etc until it either finds an initialized frame or
the last frame of the ringbuffer.
The problem was, however, that the initial uninitialized frame was possibly
(likely?) still being initialized by the kernel. A data race between the
notification through the socket (the poll()) and the updating of the
`tp_status` field in the frame could lead to a valid frame getting skipped.
Of note is that for example libpcap does not do frame scanning. Instead it
simply exits it ring processing loop. Also interesting is that libpcap uses
atomic loads and stores on the tp_status field.
This skipping of frames had 2 bad side effects:
1. in most cases, the buffer would be full enough that the frame would
be processed in the next pass of the ring, but now the frame would
out of order. This might have lead to packets belong to the same
flow getting processed in the wrong order.
2. more severe is the soft lockup case. The skipped frame sits at ring
buffer index 0. The rest of the ring has been cleared, after the
initial frame was skipped. As our pass of the ring stops at the end
of the ring (ptv->frame_offset + 1 == ptv->req.v2.tp_frame_nr) the code
exits the ring processing loop at goes back to poll(). However, poll()
will not indicate that there is more data, as the stale frame in the
ring blocks the kernel from populating more frames beyond it. This
is now a dead lock, as the kernel waits for Suricata and Suricata
never touches the ring until it hears from the kernel.
The scan logic will scan the whole ring at most once, so it won't
reconsider the stale frame either.
This patch addresses the issues in several ways:
1. the startup "discard" logic was fixed to not skip over kernel
frames. Doing so would get us in a bad state at start up.
2. Instead of scanning the ring, we now enter a busy wait loop
when encountering a kernel frame where we didn't expect one. This
means that if we got a > 0 poll() result, we'll busy wait until
we get at least one frame.
3. Error handling is unified and cleaned up. Any frame error now
returns the frame to the kernel and progresses the frame pointer.
4. If we find a frame that is owned by us (TP_STATUS_USER_BUSY) we
yield to poll() immediately, as the next expected status of that
frame is TP_STATUS_KERNEL.
5. the ring is no longer processed until the "end" of the ring (so
highest index), but instead we process at most one full ring size
per run.
6. Work with a copy of `tp_status` instead of accessing original touched
also by the kernel.
Bug: #4785.