Commit Graph

76 Commits (suricata-3.2.1)

Author SHA1 Message Date
Sascha Steinbiss e6044aaf1c mpm/spm: check for SSSE3 and enable/disable HS
The new Hyperscan 4.4 API provides a function to check for SSSE3
presence at runtime. This allows us to fall back to non-Hyperscan
matchers on systems without SSSE3 even when the suricata executable
is built with Hyperscan support. Addresses Redmine issue #2010.

Signed-off-by: Sascha Steinbiss <sascha@steinbiss.name>
Tested-by: Arturo Borrero Gonzalez <arturo@debian.org>
9 years ago
Victor Julien bb0cd0e883 prefilter: rename PatternMatcherQueue datatype
In preparation of the introduction of more general purpose prefilter
engines, rename PatternMatcherQueue to PrefilterRuleStore. The new
engines will fill this structure a similar way to the current mpm
prefilters.
9 years ago
Victor Julien 4c0ab681f2 mpm: remove Cleanup API call
It's unused by all of the implementations.
9 years ago
Justin Viiret c9d0d6f698 mpm: add "auto" default for mpm-algo
Setting mpm-algo to "auto" will use "hs" if Suricata was built against
Hyperscan, and "ac" otherwise (or "ac-tile" on Tilera platforms).
9 years ago
Victor Julien 9b6e292a28 mpm: remove unused max pattern len field 10 years ago
Victor Julien 157ca89dd7 mpm: remove useless flag from factory 10 years ago
Victor Julien 4e91f6b1e6 mpm: in factory register, consider name const 10 years ago
Victor Julien eb19fc4c7b mpm: remove unused structure 10 years ago
Victor Julien 9b3d4f7e24 mpm: unify & localize mpm pattern (id) handling
So far, the patterns as passed to the mpm's would use global id's that
were shared among all buffers, directions. This would lead to a fairly
large pattern id space. As the mpm algo's use the pattern id's to
prevent duplicate matching through a pattern id based bitarray,
shrinking this space will optimize performance.

This patch implements this. It sets a flag before adding the pattern
to the mpm ctx, instructing the mpm to ignore the provided pid and
handle pids management itself. This leads to a shrinking of the
bitarray size.

This is made possible by the previous work that removes the pid logic
from the code.

Next to this, this patch moves the pattern setup stage to common util
functions. This avoids code duplication.

Update ac, ac-bs and ac-ks to use this.
10 years ago
Victor Julien 46734ec41b mpm: remove unused pmq merge function 10 years ago
Victor Julien fa885e1d85 mpm: remove pattern id logic 10 years ago
Victor Julien e48d745ed7 mpm: constify search func args 10 years ago
Victor Julien 14d9ce7b2e detect/mpm: remove unused max_id param from API 10 years ago
Victor Julien 0d3f671b55 detect: constify mpm/detect funcs 10 years ago
Victor Julien 4f8e1f59a6 mpm: remove obsolete mpm algos
Remove: ac-gfbs, wumanber, b2g, b3g.
10 years ago
Justin Viiret 13b87f5aff mpm: add Hyperscan integration
This adds an MPM implementation that uses the Hyperscan regex engine
library from Intel, accessible as the "hs" mpm-algo.
10 years ago
Ken Steele 736ac6a459 Use SigIntId as the type for storing signature IDs (Internal)
Previously using uint32_t, but SigIntId is currently uint16_t, so arrays
will take less memory.
11 years ago
Ken Steele 7835070385 Replace memcpy() in MpmAddSids with copy loop
For the short size of most sids lists, a straight copy loop is faster.
11 years ago
Ken Steele 7a2095d851 In AC-Tile, convert from using pids for indexing to pattern index
Use an MPM specific pattern index, which is simply an index starting
at zero and incremented for each pattern added to the MPM, rather than
the externally provided Pattern ID (pid), since that can be much
larger than the number of patterns. The Pattern ID is shared across at
MPMs. For example, an MPM with one pattern with pid=8000 would result
in a max_pid of 8000, so the pid_pat_list would have 8000 entries.

The pid_pat_list[] is replaced by a array of pattern indexes. The PID is
moved to the SCACTilePatternList as a single value. The PatternList is
also indexed by the Pattern Index.

max_pat_id is no longer needed and mpm_ctx->pattern_cnt is used instead.

The local bitarray is then also indexed by pattern index instead of PID, making
it much smaller. The local bit array sets a bit for each pattern found
for this MPM. It is only kept during one MPM search (stack allocated).

One note, the local bit array is checked first and if the pattern has already
been found, it will stop checking, but count a match. This could result in
over counting matches of case-sensitve matches, since following case-insensitive
matches will also be counted. For example, finding "Foo" in "foo Foo foo" would
report finding "Foo" 2 times, mis-counting the third word as "Foo".
11 years ago
Ken Steele 1c03eb56d0 Fix bug in MPM rule array handling
In PmqMerge() use MpmAddSids() instead of blindly copying the src
rule list onto the end of the dst rule list, since there might not
be enough room in the dst list. MpmAddSids() will resize the dst array
if needed.

Also add code to MpmAddSids() MpmAddPid() to better handle the case
that realloc fails to get more space. It first tries 2x the needed
space, but if that fails, it tries for just 1x. If that fails resize
returns 0. For MpmAddPid(), if resize fails, the new pid is lost. For
MpmAddSids(), as many SIDs as will fit are added, but some will be
lost.
11 years ago
Ken Steele ab8b1158b0 Dynamically resize pattern id array as needed
Rather than creating the array of size maxpatid, dynamically resize as needed.
This also handles the case where duplicate pid are added to the array.

Also fix error in bitarray allocation (local version) to always use bitarray_size.
11 years ago
Ken Steele 104a903478 Dynamically resize pmq->rule_id_array
Rather than statically allocate 64K entries in every rule_id_array,
increase the size only when needed. Created a new function MpmAddSids()
to check the size before adding the new sids. If the array is not large
enough, it calls MpmAddSidsResize() that calls realloc and does error
checking. If the realloc fails, it prints an error and drops the new sids
on the floor, which seems better than exiting Suricata.

The size is increased to (current_size + new_count) * 2. This handles the
case where new_count > current_size, which would not be handled by simply
using current_size * 2. It should also be faster than simply reallocing to
current_size + new_count, which would then require another realloc for each
new addition.
11 years ago
Ken Steele d03f124445 Implement MPM opt for b2g, b3g, wumanber
Found problems in b2gm and b2gc, so those are removed.
11 years ago
Victor Julien e49d0a5924 MPM: build sid list from MPM matches
Pmq add rule list: Array of uint32_t's to store (internal) sids from the MPM.

AC: store sids in the pattern list, append to Pmq::rule_id_array on match.

Detect: sort rule_id_array after it was set up by the MPM. Rule id's
(Signature::num) are ordered, and the rule's with the lowest id are to
be inspected first. As the MPM doesn't fill the array in order, but instead
'randomly' we need this sort step to assure proper inspection order.
11 years ago
Anoop Saldanha 429c6388f6 App layer API rewritten. The main files in question are:
app-layer.[ch], app-layer-detect-proto.[ch] and app-layer-parser.[ch].

Things addressed in this commit:
- Brings out a proper separation between protocol detection phase and the
  parser phase.
- The dns app layer now is registered such that we don't use "dnstcp" and
  "dnsudp" in the rules.  A user who previously wrote a rule like this -

  "alert dnstcp....." or
  "alert dnsudp....."

  would now have to use,

  alert dns (ipproto:tcp;) or
  alert udp (app-layer-protocol:dns;) or
  alert ip (ipproto:udp; app-layer-protocol:dns;)

  The same rules extend to other another such protocol, dcerpc.
- The app layer parser api now takes in the ipproto while registering
  callbacks.
- The app inspection/detection engine also takes an ipproto.
- All app layer parser functions now take direction as STREAM_TOSERVER or
  STREAM_TOCLIENT, as opposed to 0 or 1, which was taken by some of the
  functions.
- FlowInitialize() and FlowRecycle() now resets proto to 0.  This is
  needed by unittests, which would try to clean the flow, and that would
  call the api, AppLayerParserCleanupParserState(), which would try to
  clean the app state, but the app layer now needs an ipproto to figure
  out which api to internally call to clean the state, and if the ipproto
  is 0, it would return without trying to clean the state.
- A lot of unittests are now updated where if they are using a flow and
  they need to use the app layer, we would set a flow ipproto.
- The "app-layer" section in the yaml conf has also been updated as well.
12 years ago
Anoop Saldanha a49cbf8a49 Code cleanup.
Use the MpmAddPattern[CS|CI] wrapper to add patterns to the mpm context.

Also use MpmInitCtx() to init the mpm context.
12 years ago
Anoop Saldanha 9c0456ebbe Removed unused function MpmMatcherGetMaxPatternLength. 12 years ago
Ken Steele e05034f5dd New Multi-pattern matcher, ac-tile, optimized for Tile architecture.
Aho-Corasick mpm optimized for Tilera Tile-Gx architecture. Based on the
util-mpm-ac.c code base. The primary optimizations are:
1) Matching function used Tilera specific instructions.
2) Alphabet compression to reduce delta table size to increase cache
   utilization  and performance.

The basic observation is that not all 256 ASCII characters are used by
the set of multiple patterns in a group for which a DFA is
created. The first reason is that Suricata's pattern matching is
case-insensitive, so all uppercase characters are converted to
lowercase, leaving a hole of 26 characters in the
alphabet. Previously, this hole was simply left in the middle of the
alphabet and thus in the generated Next State (delta) tables.

A new, smaller, alphabet is created using a translation table of 256
bytes per mpm group. Previously, there was one global translation
table for converting upper case to lowercase.

Additional, unused characters are found by creating a histogram of all
the characters in all the patterns. Then all the characters with zero
counts are mapped to one character (0) in the new alphabet. Since
These characters appear in no pattern, they can all be mapped to a
single character and still result in the same matches being
found. Zero was chosen for the value in the new alphabet since this
"character" is more likely to appear in the input. The unused
character always results in the next state being state zero, but that
fact is not currently used by the code, since special casing takes
additional instructions.

The characters that do appear in some pattern are mapped to
consecutive characters in the new alphabet, starting at 1. This
results in a dense packing of next state values in the delta tables
and additionally can allow for a smaller number of columns in that
table, thus using less memory and better packing into the cache. The
size of the new alphabet is the number of used characters plus 1 for
the unused catch-all character.

The alphabet size is rounded up to the next larger power-of-2 so that
multiplication by the alphabet size can be done with a shift.  It
might be possible to use a multiply instruction, so that the exact
alphabet size could be used, which would further reduce the size of
the delta tables, increase cache density and not require the
specialized search functions. The multiply would likely add 1 cycle to
the inner search loop.

Since the multiply by alphabet-size is cleverly merged with a mask
instruction (in the SINDEX macro), specialized versions of the
SCACSearch function are generated for alphabet sizes 256, 128, 64, 32
and 16.  This is done by including the file util-mpm-ac-small.c
multiple times with a redefined SINDEX macro. A function pointer is
then stored in the mpm context for the search function. For alpha bit
sizes of 8 or smaller, the number of states usually small, so the DFA
is already very small, so there is little difference using the 16
state search function.

The SCACSearch function is also specialized by the size of the value
stored in the next state (delta) tables, either 16-bits or 32-bits.
This removes a conditional inside the Search function. That
conditional is only called once, but doesn't hurt to remove
it. 16-bits are used for up to 32K states, with the sign bit set for
states with matches.

Future optimization:

The state-has-match values is only needed per state, not per next
state, so checking the next-state sign bit could be replaced with
reading a different value, at the cost of an additional load, but
increasing the 16-bit next state span to 64K.

Since the order of the characters in the new alphabet doesn't matter,
the new alphabet could be sorted by the frequency of the characters in
the expected input stream for that multi-pattern matcher. This would
group more frequent characters into the same cache lines, thus
increasing the probability of reusing a cache-line.

All the next state values for each state live in their own set of
cache-lines. With power-of-two sizes alphabets, these don't overlap.
So either 32 or 16 character's next states are loaded in each cache
line load. If the alphabet size is not an exact power-of-2, then the
last cache-line is not completely full and up to 31*2 bytes of that
line could be wasted per state.

The next state table could be transposed, so that all the next states
for a specific character are stored sequentially, this could be better
if some characters, for example the unused character, are much more
frequent.
12 years ago
Anoop Saldanha cdaa13012a fix for #882.
Refactor the code that initializes the cuda mpm environment.
12 years ago
Anoop Saldanha 3c2ddf04c1 Update mpm init ctx to not accept the final cuda_rc_module argument.
It was a part of our older architecture and is no longer used.
12 years ago
Anoop Saldanha 17c763f855 Version 1 of AC Cuda. 12 years ago
Anoop Saldanha b787da5643 Remove all cuda related code in the engine except for the cuda api wrappers 12 years ago
Anoop Saldanha f4ce9011d2 make mpm ctx container de_ctx specific. Also introduce global variable in mpm_ctx. this is a workaround for cleaning non global mpm_ctx's since we now don't supply the de_ctx around the detection engine API 13 years ago
Anoop Saldanha 419cdc8558 support splitting mpm ctxs based on direction v2 14 years ago
Anoop Saldanha 199288309d Support for new MPM ac-bs added 14 years ago
Anoop Saldanha 1389cf6913 update cuda mpm to support per proto mpm contexts. Fix faulty stream mpm usage of cuda 14 years ago
Martin Beyer b1c577f829 cuda streams support in b2g-cuda MPM 15 years ago
Martin Beyer 621815ded0 cuda-packet-batcher timeout supports float values 15 years ago
Anoop Saldanha c734cd1bdd make cuda mpm parameters configurable 15 years ago
Anoop Saldanha 3c73854d2d completely remove populate_mpm_flags. Some indentation changes. Also disable support to avoid double checks inside payload inspection for patterns added to mpm. Also add support to MpmFactory to reclaim a mpm_ctx 15 years ago
Victor Julien 344ea14695 Change mpm hash_size config setting highest to higher as highest wasn't the... highest. Max was higher. Leaving highest as an alias to higher for backwards compatibility. 15 years ago
Anoop Saldanha 0ef684705c support single mpm context distribution across sghs in staging. Also see to it that ac works fine with this setup 15 years ago
Anoop Saldanha 658ff5753d aho-corasick for the cpu. We have 2 versions of ac. The first MPM_AC uses the delta table and the secone one MPM_AC_GFBS uses the goto-failure table 15 years ago
Victor Julien 87f88867f4 Further improve B2gc. Add B2gm. Improve memory layout. 15 years ago
Victor Julien 9dfbab42f8 WIP B2gc 15 years ago
Victor Julien 31261e7583 Improve B2g performance by merging pattern array and hash. 15 years ago
Victor Julien a0c1209a44 Inspect the reassembled stream together with the packet payload in the same direction. 15 years ago
William Metcalf 2eef905c07 GPL and Copyright header updates. 15 years ago
Victor Julien e27cefa6f7 Complete conversion of pattern id mpm storage vs sig id storage. 15 years ago
Victor Julien 7a427ec7f4 Switch to pattern id based results checking in the mpm. Move app layer proto detection towards a more signature based approach. 15 years ago