In preparation of a patchset that will allow for disabling the detect
module, this patch introduces a way to register a function for getting
the lowest active tx id. This is used by the app layer for cleaning up
transactions that already fully inspected, and by the flow timeout code
to determine if a flow is fully inspected and logged at timeout.
The registration function RegisterAppLayerGetActiveTxIdFunc allows for
registration of a custom function of type:
uint64_t (*GetActiveTxIdFunc)(Flow *f, uint8_t flags);
If no function is called, AppLayerTransactionGetActiveDetectLog is used,
which implements the existing behaviour of considering both the
inspect_id's and the log_id.
In AppLayerTransactionsCleanup instead of figuring out 'done' tx id's
itself, now call AppLayerTransactionGetActive for both directions to
figure out the completed TX id's.
StreamMsgs would be stored in a per thread queue before being
attached to the tcp ssn. This is unnecessary, so this patch
removes this queue and puts the smsgs into the ssn directly.
Large patch as it affects a lot of tests.
StreamMsg' flow reference was used mostly to make sure a flow would
not get removed from the hash before inspection. For this it needed
to reference the flow use_cnt reference counter. Nowadays we have
more advanced flow timeout handling. This will make sure that if
there still are pending smsgs' in a flow, these will still be
processed.
Preparation for removing flow pointer from StreamMsg. Instead of
getting the ssn indirectly through StreamMsg->flow, we pass it
directly as all callers have it already.
StreamSmgs are used for raw stream reassembly only. They could also
be used to tell the rest of the engine about sequence gaps. This was
a left over from the older implementation, where the app layer used
the smsgs as well.
This patch updates the logic of the packet acquisition loop. When
the reader loop function is called and when the data to read
at offset is a without data (kernel) or still used by suricata. We
try to iter for a loop on the ring to try to find kernel put by
data.
As we are entering the function because the poll said there was some
data. This allow us to jump to the data added to the ring by the
kernel.
When using suricata in autofp mode, with multiple detect threads and
packet acquisition threads attached to a dedicated CPU, the reader
loop function was looping really fast because poll call was returning
immediatly because we did read the data available.
This patch adds a memory counter for HTP memory usage. As
there is no thread variables available in application layer
the counter has been added to the TCP reassembly thread.
This patch introduces wrapper functions around allocation functions
to be able to have a global HTP memcap. A simple subsitution of
function was not enough because allocated size needed to be known
during freeing and reallocation.
The value of the memcap can be set in the YAML and is left by default
to unlimited (0) to avoid any surprise to users.
When running with sgh-mpm-context: full, many more MPMs are created
(16K) and many are small. If they have less than 128 states, they only
need 1 byte for the next state instead of 2 bytes, cutting the size of
the next-state table in half. This reduces total memory usage.
Since that makes 3 different state sizes (1, 2 and 4 bytes), rather
than going from 2 copies of the code to create the MPM to 3, I
factored out the code that fills the next-state table into three
functions so that all the other code could be the same.
The search function is now parameterize for 8-bit and 16-bit state
sizes and alphabet sizes 8, 16, 32, 64, 128 and 256.
Live device counter was in fact the number of packets seen by suricata
and not the total number of packet reported by pfring. This patch fixes
this by using counter provided by kernel instead.
Pfring kernel counter is per socket and is not cleared after read.
So to get the number of packet on the interface we can add the new
value for this thread and add it to the interface counter.
Live device counter was in fact the number of packets seen by suricata
and not the total number of packet reported by kernel. This patch fixes
this by using counter provided by kernel instead.
The counter is Clear On Read, so by adding the value fetch at each call
and earch sockets we get the number of packets and drops for the
interface.
- Seems to be a regression introduced in the commit
796bfab231 (fix was already done in commit
ee0b21652b)
- Doesn't happen with htplib v0.5.6, but it does in the latest, v0.5.9
For AppLayerThreadCtx, AppLayerParserState, AppLayerParserThreadCtx
and AppLayerProtoDetectThreadCtx, use opaque pointers instead of
void pointers.
AppLayerParserState is declared in flow.h as it's part of the Flow
structure.
AppLayerThreadCtx is declared in decode.h, as it's part of the
DecodeThreadVars structure.
A number of unittests would lead to clang build errors because
of unsafe det_ctx ptr usage. This patch fixes these and inits
det_ctx to NULL in the other detect tests.
app-layer.[ch], app-layer-detect-proto.[ch] and app-layer-parser.[ch].
Things addressed in this commit:
- Brings out a proper separation between protocol detection phase and the
parser phase.
- The dns app layer now is registered such that we don't use "dnstcp" and
"dnsudp" in the rules. A user who previously wrote a rule like this -
"alert dnstcp....." or
"alert dnsudp....."
would now have to use,
alert dns (ipproto:tcp;) or
alert udp (app-layer-protocol:dns;) or
alert ip (ipproto:udp; app-layer-protocol:dns;)
The same rules extend to other another such protocol, dcerpc.
- The app layer parser api now takes in the ipproto while registering
callbacks.
- The app inspection/detection engine also takes an ipproto.
- All app layer parser functions now take direction as STREAM_TOSERVER or
STREAM_TOCLIENT, as opposed to 0 or 1, which was taken by some of the
functions.
- FlowInitialize() and FlowRecycle() now resets proto to 0. This is
needed by unittests, which would try to clean the flow, and that would
call the api, AppLayerParserCleanupParserState(), which would try to
clean the app state, but the app layer now needs an ipproto to figure
out which api to internally call to clean the state, and if the ipproto
is 0, it would return without trying to clean the state.
- A lot of unittests are now updated where if they are using a flow and
they need to use the app layer, we would set a flow ipproto.
- The "app-layer" section in the yaml conf has also been updated as well.
of the archaic features we use in the app layer. We will reintroduce this
parser shortly. Also do note that keywords that rely on the ssh parser
would now be disabled.
Coverity 1139544
If strdup would fail, 'node' was freed but it wasn't set to NULL. The
code then returned node. The caller would not detect there was an error
and use the freed pointer.
By moving FlowReference() out of FlowGetFlowFromHash() and into the one
function that calls it, all the flow functions take const Packet * instead
of Packet *.
A ptr to local var is stored in the radix tree currently,
this patch permits to alloc space to store host timeout
and thus also free it when data is removed.
defrag-config.c: In function 'DefragParseParameters':
defrag-config.c:105: warning: passing argument 2 of 'DefragPolicyAddHostInfo' from incompatible pointer type
make[3]: *** [defrag-config.o] Error 1
This would only happen in memory failure conditions.
util-decode-der.c:634:27: warning: Potential leak of memory pointed to by 'child'
return (Asn1Generic *)node;
stream-tcp-reassemble.c:2569:17: warning: Value stored to 'seg' is never read
seg = seg->next;
^ ~~~~~~~~~
stream-tcp-reassemble.c:2587:17: warning: Value stored to 'seg' is never read
seg = seg->next;
These were only used if debug is enabled.
app-layer-dns-tcp.c:407:13: warning: Value stored to 'length' is never read
length = *data;
app-layer-dns-udp.c:236:13: warning: Value stored to 'length' is never read
length = *data;
Detect when default_packet_size is zero, which enables zero-copy mode for
pfring and in that case, do what AF Packet does and set pkt_ext pointer to
the data and set PKT_ZERO_COPY flag.
The uint8_t *pkt in the Packet structure always points to the memory
immediately following the Packet structure. It is better to simply
calculate that value every time than store the 8 byte pointer.
Some of the fields in the SCACTileCtx struct are only used to create the MPM,
but are not needed to search the MPM. Create a new structure to contain just
the data needed by AC Search. After creating the MPM, copy the data into the
new structure and then free the memory only needed during initialization.
This reduces the size of the AC-Tile MPM context from 1360 bytes down to 296
bytes.
Add two new mPIPE load-balancing configuration options in suricata.yaml.
1) "sticky" which keep sending flows to one CPU, but if that queue is full,
don't drop the packet, move the flow to the least loaded queue.
2) Round-robin, which always picks the least full input queue for each
packet.
Allow configuring the number of packets in the input queue (iqueue) in
suricata.yaml.
For the mPipe.buckets configuration, which must be a power of 2, round
up to the next power of two, rather than report an error.
Added mpipe.min-buckets, which defaults to 256, so if the requested number
of buckets can't be allocated, Suricata will keep dividing by 2 until either
it succeeds in allocating buckets, or reaches the minimum number of buckets
and fails.
In SigMatchSignatures, the value p->flow doens't change, but GCC can't
figure that out, so it reloads p->flow many times during the function.
When p->flow is loaded into the variable pflow once at the start of the
function, the compile then doesn't need to reload it.
[src/app-layer-htp.c:1967] -> [src/app-layer-htp.c:1978]: (warning) \
Possible null pointer dereference: tx - otherwise it is redundant \
to check it against null.
pcap has a callback function that is called for each packet. Once a
second, it's meant to 'dump stats'. However, the timing logic was
broken, so it would actually dump stats for each packet.
By moving the stats second timer into the thread vars, next calls of
the callback will be able to use the stored time.
Flow timeout code worked by luck when checking if a flow still needed
reassembly for app layer inspection or logging. It would check for a
part of raw reassembly (smsg list) to determine if detection was
needed. In this case it would also process app layer cleanup,
including logging.
Introduced AppLayerTransactionGetActive which returns the lowest tx_id
in a direction that still needs some work.
FlowForceReassemblyNeedReassmbly now uses it to determine if the
applayer still needs work.
Converted FlowForceReassemblyForHash to use the checking function
FlowForceReassemblyNeedReassmbly as well, so that checking if a flow
needs work is now unified.
Raw reassembly is used only by the detection engine. For users only
caring about logging it's a significant overhead, both in cpu and
memory usage.
The option is called 'raw' and lives under the stream.reassembly
options.
stream:
memcap: 32mb
checksum-validation: yes # reject wrong csums
inline: auto # auto will use inline mode in IPS mode, yes or no set it statically
reassembly:
memcap: 64mb
depth: 1mb # reassemble 1mb into a stream
toserver-chunk-size: 2560
toclient-chunk-size: 2560
randomize-chunk-size: yes
#randomize-chunk-range: 10
raw: false # <- new option
Spotted out by clang:
source-erf-dag.h|25 col 9| warning: '__SOURCE_ERR_DAG_H__'
is used as a header guard here, followed by #define of a different macro
[-Wheader-guard]
Prevents benign log message of parent nodes of final values being
redefined (which ends up having no affect as the final nodes
are protected from being removed).
If we have multiple layer of tunnel, the decoding of initial
Packet will recurse in DecodeTunnel function called in
PacketTunnelPktSetup. If we are not setting the pseudo
packet root before calling DecodeTunnel (as done in previous
code), then the tunnel root will no be correct for the lower
layer packets. This result in an counter problem and a suricata
failure after some time.
If a session only has data in one direction, like ftp data sessions,
protocol detection will only run in one direction. This led to a
situation where reassembly would hold all the segments as proto
detection was never flagged as complete.
This patch introduces a limit for protocol detection in this case.
If the limit is reached, detection will give up.
This patch adds a '-k' option to suricata to be able to specify
the checksum validation to use. If '-k all' is used, checksum
validation is forced. If '-k none' is used, no checksum validation
is made.
Message output in case of detection of a pcap file with a probable
cheksum issue has been updated to indicate that '-k' is a solution.
This patch adds support for checksum-checks in the pcap-file running
mode. This is the same functionnality as the one already existing for
live interface.
It can be setup in the YAML:
pcap-file:
checksum-checks: auto
A message is displayed for small pcap to warn that invalid checksum
rate is big on the pcap file and that checksum-check could
be set to no.
This patch set a new value in pkt->flag to signal that a packet is
invalid during decoding. The patch has been obtained via a coccinelle
transformation.
This patch adds and increments a invalid packet counter. It
does this by introducing PacketDecodeFinalize function
This function is incrementing the invalid counter and is also
signalling the packet to CUDA.
Remove all flags indicating the buffer type. They were only used
at parse time.
Because of this the DetectPcreData_ structure could shrink to 32
bytes.
This patch replaces PacketPseudoPktSetup by a better named
PacketTunnelPktSetup function which is also in charge of doing
the decoding of the tunneled packet.
This allow to clean the code. But it also fixes an issue.
Previously, if the DecodeTunnel function was failling (cause of
an invalid packet mainly), the result was that the original packet
to be considered as a tunnel packet (and not inspected by payload
detection).
In some cases, the decoding is not possible and some really invalid
packet can be created. This is in particular the case of tunnel. In
that case, it is more interesting to forget about the tunneled
packet and only consider the original packet.
DecodeTunnel function is maked as warn_unused_result because it is
meaningful for the decoder to know if the underlying data were not
correct. And in this case, only focus detection on the content.
Fix sigatures not supporting [10.0.0.0/24, !10.1.1.1] notation when
used directly in a rule instead of through a variable.
Add tests for Bugs #815 and #920.
When inspecting HTTP bodies there are several limits involved.
In this patch the reaching of the body limit will trigger body
inspection.
Without this, the body would only be inspected when inspection
limits "request-body-minimal-inspect-size" or
"response-body-minimal-inspect-size" were reached. If the body
limit was smaller than this value, the body would only be
inspected at the end of the tx or stream.
Use the new GetIfaceOffloading function to display a warning message
if pcap capture is used on Linux with GRO or LRO activated. This is
helpful for kernel after 2.6.31 were pcap will use mmaped capture.
TPACKET_V2 is used and this limit the size of the packet resulting
in truncated packets when merged packets are received.
When the PKT_NOPAYLOAD_INSPECTION flag is set, don't apply it to smsgs.
This way we can still inspect the outstanding smsgs.
The PKT_NOPAYLOAD_INSPECTION is set for encrypted traffic, and is combined
with disabling stream reassembly. So we only inspect the smsgs up to the
point of the disable detection point.
When checking the reassembly limit for raw reassembly, consider the
STREAMTCP_STREAM_FLAG_NOREASSEMBLY a trigger immediately. We won't
process any more segments in the reassembly engine anyway.
The meta-field-option allows for setting the hard limit of request
and response fields in HTTP. In requests this applies to the request
line and headers, not the body. In responses, this applies to the
response line and headers, not the body.
Libhtp uses a default limit of 18k. If this is reached an event is
raised.
Ticket 986.
In the SSE 4.2 SCMemcmpLowercase implementation, there would be a
_mm_load_si128 of a 2 byte array. However, _mm_load_si128 loads
16 bytes, causing it to read beyond the var. I don't think this lead
to crashes, as it was a static var, but clangs ASAN complained about
it.
Share memory space for IPV4Vars and (IPV6Vars, IPV6ExtHdrs), since a
packet can only be either IPv4 or IPv6, but not both.
Share memory for TCPVars, UDPVars, ICMPV4Vars and ICMPV6Vars, since a
packet can only be only of these.
Then move other structure members around to remove holes reported by pahole.
This reduces the size of the Packet structure from 2944 bytes (46 cachelines)
down to 1976 (31 cachelines), a 33% reduction.
Strip the 'proxy' parts from the normalized uri as inspected by http_uri,
urilen, pcre /U and others.
In a request line like:
GET http://suricata-ids.org/blah/ HTTP/1.1
the normalized URI will now be:
/blah/
This doesn't affect http_raw_uri. So matching the hostname, etc is still
possible through this keyword.
Additionally, a new per HTTP 'personality' option was added to change
this behavior: "uri-include-all":
uri-include-all: <true|false>
Include all parts of the URI. By default the
'scheme', username/password, hostname and port
are excluded. Setting this option to true adds
all of them to the normalized uri as inspected
by http_uri, urilen, pcre with /U and the other
keywords that inspect the normalized uri.
Note that this does not affect http_raw_uri.
So adding uri-include-all:true to all personalities in the yaml will
restore the old default behavior.
Ticket 1008.
Move the allocation of the mPipe ingress queue from a loop over
the number of workers in the main init function to being done inside
each worker thread. This allows allocating the memory locally on the
worker's CPU without needing to figure out ahead of time where that thread
will be running. This fixes one case of static mapping of workers to CPUs.
Use __thread to hold the queue rather than a global tables of queues.
pcre_study returning NULL is not necessarily an error, from the man page
pcre_study(3):
"If the function returns NULL, either it could not find any additional
information, or there was an error. You can tell the difference by
looking at the error value. It is NULL in first case."
Older libpcre versions would return NULL, causing errors.
Keep a separate checksum for IPV4, since a packet can have both an IPV4
checksum and a TCPV4 checksum, or IPV4 and UDPV4 checksum.
This will allow future sharing of more values.
Use PACKET_RESET_CHECKSUMS() in Unit Tests in place of setting the
individual checksum values.
When http and/or tls logging is disabled, the app layer would still
be flagged as logging. This caused transactions not to be freed until
the end of the flow as the logged tx id would never increment.
This fix postpones the setting of the app layer parser "logger"
flag to the point where we know the logger is enabled.
When logging is disabled, the app layer would still be flagged
as logging. This caused transactions not to be freed until the
end of the flow as the logged tx id would never increment.
This fix postpones the setting of the app layer parser "logger"
flag to the point where we know the logger is enabled.
In the case where DNS requests are sent over the same flow w/o a
reply being received, we now set an event in the flow and refuse
to add more transactions to the state. This protects the DNS
handling from getting overloaded slowing down everything.
A new option to configure this behaviour was added:
app-layer:
protocols:
dnsudp:
enabled: yes
detection-ports:
udp:
toserver: 53
request-flood: 750
The request-flood parameter can be 0 (disabling this feature) or a
positive integer. It defaults to 500.
This means that if 500 unreplied requests are seen in a row an event
is set. Rule 2240007 was added to dns-events.rules to match on this.
Packets that are rejected by the stream engine are not considered
part of an established tcp session. By allowing them to inspect
an smsg, some smsgs would not be properly inspected.
Copied SigTest26TCPV4Keyword and added check for invalid IPV4 checksums.
Created new SigTest26TCPV4AndIPV4Keyword test with a new packet with valid
IPV4 checksums.
When multiple segments were put into a smsg, the seq would be updated
each time a segment was added. Because of this, the seq wasn't pointing
to the start of the data.
This caused some false negatives when the fast_pattern was in the raw
stream, but another part of the inspection was in the state. Because of
the wrong seq, the inspection of the smsg could be delayed. This in turn,
could make the inspection engine consider a TX inspected, even if it wasn't
fully yet.
When installing the rules to tell mPIPE to send packet to Suricata,
give it a higher priority than the default used by Linux. This way if
Linux also tells mPIPE to send it packets, Suricata will get them
instead, as long as Suricata is running.
Libhtp decodes the + character in the query string to a space by default.
Suricata rules (e.g. etpro sid 2806767) are expecting to see the space in
the http_uri buffer.
Added an option per htp config to reenable this default behavior:
query-plusspace-decode: yes
Bug #1035.
Now that we call stream reassembly directly from proto detection, we will
need to check if reassembly has been disabled inside the stream reassembly
callback.
This prevents any calls to bypass and re-enter proto detection, despite
having reassembly disabled.
Make sure we register the detect.alerts counter before packet runtime starts
even in delayed detect mode. The registration of new counters at packet
runtime is not supported by the counters api and might lead to crashes as there
is no proper locking to allow for this operation.
This changes how delayed detect works a bit. Now we call the ThreadInit
callback twice. The first call will only register the counter. The 2nd call
will do all the other setup. This way the counter is registered before the
counters api starts operating in the packet runtime.
Fixes the segv reported in ticket #1018.
This patch is a result of applying the following coccinelle
transformation to suricata sources:
@istested@
identifier x;
statement S1;
identifier func =~ "(SCMalloc|SCStrdup|SCCalloc|SCMallocAligned|SCRealloc)";
@@
x = func(...)
... when != x
- if (x == NULL) S1
+ if (unlikely(x == NULL)) S1
Provide support for icmvp4 and icmpv6 as well. You can now use
alert icmpv4 and
alert icmpv6 as well, apart from the existing
alert icmp, which created a rule that applied to both icmpv4 and icmpv6.
cppcheck:
[detect-fast-pattern.c:1183] -> [detect-fast-pattern.c:1183]: (style) Same expression on both sides of '&'.
[detect-fast-pattern.c:1217] -> [detect-fast-pattern.c:1217]: (style) Same expression on both sides of '&'.
[detect-fast-pattern.c:1449] -> [detect-fast-pattern.c:1449]: (style) Same expression on both sides of '&'.
[detect-fast-pattern.c:1479] -> [detect-fast-pattern.c:1479]: (style) Same expression on both sides of '&'.
[detect-fast-pattern.c:1509] -> [detect-fast-pattern.c:1509]: (style) Same expression on both sides of '&'.
[detect-fast-pattern.c:1539] -> [detect-fast-pattern.c:1539]: (style) Same expression on both sides of '&'.
[detect-fast-pattern.c:1570] -> [detect-fast-pattern.c:1570]: (style) Same expression on both sides of '&'.
[detect-fast-pattern.c:1686] -> [detect-fast-pattern.c:1686]: (style) Same expression on both sides of '&'.
[detect-fast-pattern.c:1716] -> [detect-fast-pattern.c:1716]: (style) Same expression on both sides of '&'.
[detect-fast-pattern.c:1746] -> [detect-fast-pattern.c:1746]: (style) Same expression on both sides of '&'.
[detect-fast-pattern.c:1776] -> [detect-fast-pattern.c:1776]: (style) Same expression on both sides of '&'.
[detect-fast-pattern.c:1806] -> [detect-fast-pattern.c:1806]: (style) Same expression on both sides of '&'.
[detect-fast-pattern.c:1836] -> [detect-fast-pattern.c:1836]: (style) Same expression on both sides of '&'.
[detect-fast-pattern.c:1866] -> [detect-fast-pattern.c:1866]: (style) Same expression on both sides of '&'.
[detect-fast-pattern.c:1896] -> [detect-fast-pattern.c:1896]: (style) Same expression on both sides of '&'.
[detect-fast-pattern.c:1926] -> [detect-fast-pattern.c:1926]: (style) Same expression on both sides of '&'.
[detect-fast-pattern.c:2022] -> [detect-fast-pattern.c:2022]: (style) Same expression on both sides of '&'.
cppcheck said:
[decode-pppoe.c:58] -> [decode-pppoe.c:60]: (performance, inconclusive) Variable 'pppoedh' is reassigned a value before the old one has been used if variable is no semaphore variable.
Take VLAN IDs into account when re-assembling fragments.
Prevents fragments that would otherwise match, but on different
VLANs from being reassembled with each other.
Fix bug #995. Tag time setting was initialized using "usec" field
instead of "sec" field. This led to immediate timing out of tag.
Added proper matching unittests for all tagging types.
Bug #995.
We don't support jabber protocol detection atm. Disable the code check
inside suricata to check if jabber protocol detection is enabled in the
yaml file.
Also updated an error log message for app layer.
In case of recursive call to protocol detection from within protocol
detection, and the recursively invoked stream still hasn't been ack'ed
yet, protocol detection doesn't take place. In such cases we will end up
still calling the app layer with the wrong direction data. Introduce a
check to not call app layer with wrong direction data.
When sockets are re-used reset all relevant vars correctly.
This commit fixes a bug where we were not reseting app proto detection
vars.
While fixing #989, we discovered some other bugs which have also been
fixed, or rather some features which are now updated. One of the feature
update being if we recieve wrong direction data first, we don't reset the
protocol values for the flow. We let the flow retain the detected
values.
Unittests have been modified to accomodate the above change.
Use the tx id stored for each alert to find the correct XFF address
to add to the extra-data field.
In overwrite mode we still only grab the first available XFF addr,
as this address is set in the header preceeding the individual alerts.
Issue #904.
When generating an alert and storing it in the packet, store the tx_id
as well. This way the output modules can log the tx_id and access the
proper tx for logging.
Issue #904.
Some events we retrieved from error messages are flag now, so check
those. Not all can be converted though. These are no longer set:
HTTP_DECODER_EVENT_INVALID_TRANSFER_ENCODING_VALUE_IN_RESPONSE
HTTP_DECODER_EVENT_INVALID_AUTHORITY_PORT
Part of Bug #982.
We have follow TCP RFC (http://tools.ietf.org/html/rfc793#section-3.4).
There is two cases depending on wether the original packet contains a
ACK.
If packet has no ACK, the RST seq number is 0 and the ACK is built the
standard way.
If packet has a ACK, the seq of the RST packet is equal to the ACK of
incoming packet and the ACK is build using packet sequence number and
size of the data.
Regarding standard Ack number, it is computed using seq number of captured
packet added to packet length. Finally 1 is added so we respect the
RFC:
If the ACK control bit is set this field contains the value of the
next sequence number the sender of the segment is expecting to
receive. Once a connection is established this is always sent.
With this patch we have some correct results. With the following rule:
reject ssh any any -> 192.168.56.3 any (msg:"no SSH way"; sid:3; rev:1;)
ssh connection to 192.168.56.3 is correctly resetted on client side.
But this is not perfect. If we have the following rule:
reject tcp any any -> 192.168.56.3 22 (msg:"no way"; sid:2; rev:1;)
then the connection is not resetted on a standard ethernet network. But
if we introduce 20ms delay on packets, then it is correctly resetted.
This is explained when looking at the network trace. The reset is sent
as answer to the SYN packet and it is emitted after the SYN ACK from
server because the exchange is really fast. So this is discarded by the
client OS which has already seen a ACK for the same sequence number.
This should fix#895.
This patch update reject code to send the packet on the interface
it comes from when 'host-mode' is set to 'sniffer-only'. When
'host-mode' is set to 'router', the reject packet is sent via
the routing interface.
This should fix#957.
This variable can be used to indicate to suricata that the host
running is running as a router or is in sniffing only mode.
This will used at least to determine which interfaces are used to
send reject message.
Split threads.h into several files, where each of these files defines
all lock types and macro's.
threads.h defines the normal case
threads-debug.h defines the debug variants
threads-profile.h defines the lock profiling variants
Finally, threads-arch-tile.h moves the Tilera specifics out
Thresholds and suppression can be handled independently. Suppression
only suppresses output, and is not related to Threshold state tracking.
This simplifies mixing suppression and thresholding rules.
Part of the Bug #425 effort.
On Tile, replace pthread_mutex_locks with queued spin locks (ticket
locks) for dataplane processing code. This is safe when running on
dataplane cores with one thread per core. The condition variables are
no-ops when the thread is spinning anyway.
For control plane threads, unix-manager, stats-logs, thread startup,
use pthread_mutex_locks. For these locks replaced SCMutex with SCCtrlMutex
and SCCond with SCCtrlCond.
app-layer-parser.c: In function ‘AppLayerPPTestData’:
app-layer-parser.c:2525:9: error: variable ‘dir’ set but not used [-Werror=unused-but-set-variable]
int dir = 0;
^
Changed the signature sorting code to use a a single merge sort instead
of the multiple pass sorting that was being used. This reduces startup
time on Tile by a factor of 3.
Also replace the user array of pointers to ints with a simpler array of
ints.
Removed a flag parameter introuced earlier to indicate the data
that is first acceptable by the parser. We now use a differently
named parameter to carry out the same activity.
The logic we use currently is if we have already sent some data to
a parser before we figure out we have a proto mismatch, we use the
proto from the first direction from which we have already sent the
data to the parser, else we stick to the the to client direction.
Now we can specify alproto, ip_proto combinations this way
alert dns (ip_proto:[tcp/udp];)
alert ip (app-layer-protocol:dns;)
alert ip (app-layer-protocol:dns; ip_proto:tcp;)
alert tcp (app-layer-protocol:dns:)
so on. Neater than using dnstcp/dnsudp.
This is related to feature #424.
PM and PP phase.
Replace the areas of the code that would otherwise rely on setting/reading
these flags with these macros.
Other minor tweaks to some api calls.
1. Proto detection
2. Parsers
For app layer protocols.
libhtp has now been moved to the section under app-layer.protocols.http,
but we still provide backward compatibility with older conf files.
The parser wasn't carrying out a bounds check on record length while
in the middle of parsing a handshake. As a result we would step onto the
next record header and consider it a part of the current handshake.
- Contains an unittest to test the issue.
- Disable the duplicate parser unittest registration.
The issue came to light through an irregular ssl record, which was
reported by Sebastian Roschke, via CVE-2013-5919.
Thanks to Sebastian Roschke for reporting this issue.
Content strings that are a duplicate of a pattern from another sig, but
have a fast_pattern chop being applied, would end up being assigned the
same pattern id as the duplicate string. But the string supplied to the
mpm would be the chopped string, which might result in the state_table
output_state content entry being over-riden by the the fuller string at
the final state of the smaller content length, because of which during a
match we might end up inspecting the search buffer against the fuller
content pattern, instead of the chopped pattern, which would end up being
an inspection beyond the buffer bounds.
Content strings that are a duplicate of a pattern from another sig, but
have a fast_pattern chop being applied, would end up being assigned the
same pattern id as the duplicate string. But the string supplied to the
mpm would be the chopped string, which might result in the state_table
output_state content entry being over-riden by the the fuller string at
the final state of the smaller content length, because of which during a
match we might end up inspecting the search buffer against the fuller
content pattern, instead of the chopped pattern, which would end up being
an inspection beyond the buffer bounds.
Content strings that are a duplicate of a pattern from another sig, but
have a fast_pattern chop being applied, would end up being assigned the
same pattern id as the duplicate string. But the string supplied to the
mpm would be the chopped string, which might result in the state_table
output_state content entry being over-riden by the the fuller string at
the final state of the smaller content length, because of which during a
match we might end up inspecting the search buffer against the fuller
content pattern, instead of the chopped pattern, which would end up being
an inspection beyond the buffer bounds.
The old behaviour of returning a failure if we found a pattern while
matching on negated content is now changed to continuing searching
for other combinations where we don't find the pattern for the
negated content.
Thanks to Will Metcalf for reporting this.
Move SIMD the implementations of SigMatchSignaturesBuildMatchArray()
for SSE3 and Tile out of detect.c to reduce the size of the file.
Also moved SIMD unit tests to detect-simd.c
Tilera's GCC supports the GCC __sync_ intrinsics.
Increase the size of some atomic variables for better performance on
Tile. The Tile-Gx architecture has native support for 32-bit and
64-bit atomic operations, but not 8-bit and 16-bit, which are emulated
using 32-bit atomics, so changing some 16-bit and 8-bit atomic into
ints improves performance.
Increasing the size of the atomic variables modified in this change
does not increase the total size of the structures in which they
reside because of existing padding requirements. The one case that
would increase the size of the structure (Flow_) was confitionalized
to only change the size on Tile.
compressions, cunning compression that's not what it says it is, etc.
These unittests are tweaked to pass. When libhtp fixes these issues
we will have to reenable them.
Aho-Corasick mpm optimized for Tilera Tile-Gx architecture. Based on the
util-mpm-ac.c code base. The primary optimizations are:
1) Matching function used Tilera specific instructions.
2) Alphabet compression to reduce delta table size to increase cache
utilization and performance.
The basic observation is that not all 256 ASCII characters are used by
the set of multiple patterns in a group for which a DFA is
created. The first reason is that Suricata's pattern matching is
case-insensitive, so all uppercase characters are converted to
lowercase, leaving a hole of 26 characters in the
alphabet. Previously, this hole was simply left in the middle of the
alphabet and thus in the generated Next State (delta) tables.
A new, smaller, alphabet is created using a translation table of 256
bytes per mpm group. Previously, there was one global translation
table for converting upper case to lowercase.
Additional, unused characters are found by creating a histogram of all
the characters in all the patterns. Then all the characters with zero
counts are mapped to one character (0) in the new alphabet. Since
These characters appear in no pattern, they can all be mapped to a
single character and still result in the same matches being
found. Zero was chosen for the value in the new alphabet since this
"character" is more likely to appear in the input. The unused
character always results in the next state being state zero, but that
fact is not currently used by the code, since special casing takes
additional instructions.
The characters that do appear in some pattern are mapped to
consecutive characters in the new alphabet, starting at 1. This
results in a dense packing of next state values in the delta tables
and additionally can allow for a smaller number of columns in that
table, thus using less memory and better packing into the cache. The
size of the new alphabet is the number of used characters plus 1 for
the unused catch-all character.
The alphabet size is rounded up to the next larger power-of-2 so that
multiplication by the alphabet size can be done with a shift. It
might be possible to use a multiply instruction, so that the exact
alphabet size could be used, which would further reduce the size of
the delta tables, increase cache density and not require the
specialized search functions. The multiply would likely add 1 cycle to
the inner search loop.
Since the multiply by alphabet-size is cleverly merged with a mask
instruction (in the SINDEX macro), specialized versions of the
SCACSearch function are generated for alphabet sizes 256, 128, 64, 32
and 16. This is done by including the file util-mpm-ac-small.c
multiple times with a redefined SINDEX macro. A function pointer is
then stored in the mpm context for the search function. For alpha bit
sizes of 8 or smaller, the number of states usually small, so the DFA
is already very small, so there is little difference using the 16
state search function.
The SCACSearch function is also specialized by the size of the value
stored in the next state (delta) tables, either 16-bits or 32-bits.
This removes a conditional inside the Search function. That
conditional is only called once, but doesn't hurt to remove
it. 16-bits are used for up to 32K states, with the sign bit set for
states with matches.
Future optimization:
The state-has-match values is only needed per state, not per next
state, so checking the next-state sign bit could be replaced with
reading a different value, at the cost of an additional load, but
increasing the 16-bit next state span to 64K.
Since the order of the characters in the new alphabet doesn't matter,
the new alphabet could be sorted by the frequency of the characters in
the expected input stream for that multi-pattern matcher. This would
group more frequent characters into the same cache lines, thus
increasing the probability of reusing a cache-line.
All the next state values for each state live in their own set of
cache-lines. With power-of-two sizes alphabets, these don't overlap.
So either 32 or 16 character's next states are loaded in each cache
line load. If the alphabet size is not an exact power-of-2, then the
last cache-line is not completely full and up to 31*2 bytes of that
line could be wasted per state.
The next state table could be transposed, so that all the next states
for a specific character are stored sequentially, this could be better
if some characters, for example the unused character, are much more
frequent.
- Added the Unified2 file format related constants
- Added IPv6 support
- Two modes of operation with a fall-back to "extra-data" mode if
"overwrite" mode is not applicable
- Changed the configuration loading code to handle the new
configuration structure
- When creating the packet that fakes the one that generated the alert
the flow direction wasn't taken into account in overwrite mode
- Fixed BUG_ON condition
Reduced the size of the cached string buffer from 128 to 32, which is
still larger than the largest possible time string, which is 26
characters.
Added a check for the user passing in an output buffer that is smaller
than the cached string. Previously, the code would have copied past
the end of the users buffer.
Log sensible error message when the user doesn't supply a value for
stream.prealloc-sessions or when the values supplied in invalid and
the engine resorts to using a default.
Bug #939: thread name buffers are sized inconsistently
These buffers are now all fixed at 16 bytes.
Bug #914: Having a high number of pickup queues (216+) makes suricata crash
Fixed so that we can now have 256 pickup queues, which is the current built-in
maximum. Improved the error reporting.
Bug #928: Max number of threads
Error reporting improved. Issue was the same as #914.
Cookie is parsed now using uint8_t pointers (inliniac PR comments)
Changed buffer size to a power of 2 (8192) and cookie value extraction function to static (inliniac PR comments)
Added %b for request size (vinfang patch)
Writing "-" if an unknown % directive is used (vinfang patch)
Fixed bug in cookie parser
Fixed format string issue logging literal values
Improve error handling (Victor Julien comments)
(patchset rebased and reworded by Victor Julien)
This patch fixes a compilation failure on Solaris. Compiler does
not support when a function returning void is used in return of
an other function returning void.
When converting a time in seconds (64-bit seconds since 1970) to
Month/Day/Year hours minutes, Suricata calls localtime_r(), which
always aquires a lock and then does complex comutation based on the
current time zone. The time zone can be specified in the TZ
environment variable, which is only parsed the first time it is used,
or from a file. The default file is /etc/localtime. The file is
checked each time to see if it might have changed and is reparsed if
it has changed.
The GLIBC library has a lock inside localtime_r(), which limits
parallelism, which is a problem when the rate of generating alerts is
high, since Suricata generates a new ascii time string for each alert
into fast.log.
This change caches the value returned by localtime_t() and then sets
the seconds within the minute based on the cached start-of-minute
time. All of the values return, expect for the seconds, is constant
within the same minute. Switching to a new seconds could change all
the other values, year, month, day, hour. The cache stores the current
and previous minute values.
The same trick is used in CreateTimeString() for generated time
string. The string, up to the minutes, is cached and then copied into
the result string, followed by printing the new seconds into the
result string.
The seconds within a minute are calculated as the difference in
seconds from the start of the current minute.
There were 8 identical copies of CreateTimeString() in 8 files.
Most used SCLocalTime, to replace localtime_r(), but some did not.
Created one copy in util-time.c.
Makes use of 8-wide byte compare instructions in signature matching.
For allocating aligned memory, _mm_malloc() is SSE only, so added
check for __tile__ to use memalign() instead.
Shows a 13% speed up.
In debug validation mode, it is required to call application layer
parsing and other functions with a lock on flow. This patch updates
the code to do so.
This patch update pf_ring capture to avoid to ask for extended
header. They are only needed when rxonly checksum checks is used
and this is only possible when interface is not a DNA interface.
To be able to split code in functions in main, we need to pass
information about the current running Suricata to functions.
For that we create a structure to store suricata run parameters.
In this patch it allows to separate command line parsing and to
treat internal running mode in a switch just after command line
parsing.
This patch is using the qa/log directory to store the output
of the check. In case of success, the directory is deleted.
In case of failure, the directory remains in place.
This should fixes#910.
The list-keyword and app-layer listing code was spread over all the
init code. This patch introduces a separate file to store non standard
running mode like these ones.
This patch is updating the flow tag system to use the flow
storage API. The tag_list member of Flow structure is suppressed
and its cleaning operation are suppressed too as this is handled
transparently by the flow storage API.
This patch is here to avoid that all modules using a local storage
have to update host code to add their free function. It modifies
previous behavior by calling HostFreeStorage in any case.
The TILE-Gx processor includes a packet processing engine, called
mPIPE, that can deliver packets directly into user space memory. It
handles buffer allocation and load balancing (either static 5-tuple
hashing, or dynamic flow affinity hashing are used here). The new
packet source code is in source-mpipe.c and source-mpipe.h
A new Tile runmode is added that configures the Suricata pipelines in
worker mode, where each thread does the entire packet processing
pipeline. It scales across all the Gx chips sizes of 9, 16, 36 or 72
cores. The new runmode is in runmode-tile.c and runmode-tile.h
The configure script detects the TILE-Gx architecture and defines
HAVE_MPIPE, which is then used to conditionally enable the code to
support mPIPE packet processing. Suricata runs on TILE-Gx even without
mPIPE support enabled.
The Suricata Packet structures are allocated by the mPIPE hardware by
allocating the Suricata Packet structure immediatley before the mPIPE
packet buffer and then pushing the mPIPE packet buffer pointer onto
the mPIPE buffer stack. This way, mPIPE writes the packet data into
the buffer, returns the mPIPE packet buffer pointer, which is then
converted into a Suricata Packet pointer for processing inside
Suricata. When the Packet is freed, the buffer is returned to mPIPE's
buffer stack, by setting ReleasePacket to an mPIPE release specific
function.
The code checks for the largest Huge page available in Linux when
Suricata is started. TILE-Gx supports Huge pages sizes of 16MB, 64MB,
256MB, 1GB and 4GB. Suricata then divides one of those page into
packet buffers for mPIPE.
The code is not yet optimized for high performance. Performance
improvements will follow shortly.
The code was originally written by Tom Decanio and then further
modified by Tilera.
This code has been tested with Tilera's Multicore Developement
Environment (MDE) version 4.1.5. The TILEncore-Gx36 (PCIe card) and
TILEmpower-Gx (1U Rack mount).
In some cases using the vlan id(s) in flow hashing is problematic. Cases
of broken routers have been reported. So this option allows for disabling
the use of vlan id(s) while calculating the flow hash, and in the future
other hashes.
Vlan tracking for flow is enabled by default.
This commit allows handling Packets allocated by different methods.
The ReleaseData function pointer in the Packet structure is replaced
with ReleasePacket function pointer, which is then always called to
release the memory associated with a Packet.
Currently, the only usage of ReleaseData is in AF Packet. Previously
ReleaseData was only called when it was not NULL. To implement the
same functionality as before in AF Packet, a new function is defined
in AF Packet to first call the AFP specific ReleaseData function and
then releases the Packet structure.
Three new general functions are defined for releasing packets in the
default case:
1) PacketFree() - To release a packet alloced with SCMalloc()
2) PacketPoolReturnPacket() - For packets allocated from the Packet Pool.
Calls RECYCLE_PACKET(p)
3) PacketFreeOrRelease() - Calls PacketFree() or PacketPoolReturnPacket()
based on the PKT_ALLOC flag.
Having these functions removes the need to check the PKT_ALLOC flag
when releasing a packet in most cases, since the ReleasePacket
function encodes how the Packet was allocated. The PKT_ALLOC flag is
still set and is needed when AF Packet releases a packet, since it
replaces the ReleasePacket function pointer with its own function and
then calls PacketFreeOfRelease(), which uses the PKT_ALLOC flag.