Remove local copies from each MPM file and use include file instead.
Might be better to also add util-memcpy.c rather than inlining it each time,
to get smaller code, since only seems to be used at initialization.
When running on a TILEncore-Gx PCIe card, setting the filetype of fast.log
to pcie, will open a connection over PCIe to a host application caleld
tile-pcie-logd, that receives the alert strings and writes them to a file
on the host. The file name to open is also passed over the PCIe link.
This allows running Suricata on the TILEncore-Gx PCIe card, but have the
alerts logged to the host system's file system efficiently. The PCIe API that
is used is the Tilera Packet Queue (PQ) API which can access PCIe from User
Space, thus avoiding system calls.
Created util-logopenfile-tile.c and util-logopen-tile.h for the TILE
specific PCIe logging functionality.
Using Write() and Close() function pointers in LogFileCtx, which
default to standard write and close for files and sockets, but are
changed to PCIe write and close functions when a PCIe channel is
openned for logging.
Moved Logging contex out of tm-modules.h into util-logopenfile.h,
where it makes more sense. This required including util-logopenfile.h
into a couple of alert-*.c files, which previously were getting the
definitions from tm-modules.h.
The source and Makefile for tile-pcie-logd are added in contrib/tile-pcie-logd.
By default, the file name for fast.log specified in suricata.yaml is used as
the filename on the host. An optional argument to tile-pcie-logd, --prefix=,
can be added to prepend the supplied file path. For example, is the file
in suricata.yaml is specified as "/var/log/fast.log" and --prefix="/tmp",
then the file will be written to "/tmp/var/log/fast.log".
Check for TILERA_ROOT environment variable before building tile_pcie_logd
Building tile_pcie_logd on x86 requires the Tilera MDE for its PCIe libraries
and API header files. Configure now checs for TILERA_ROOT before enabling
builing tile_pcie_logd in contrib/tile_pcie_logd
- restore json output-file.[ch] as output-json-file.[ch] after rebase conflict
- fix Makefile.am after merge conflict
- some dev-log-api-v4.0 rebase json fallout cleanup
A new logger API for registering file storage handlers. Where the
FileLog handler is called once per file, this handler will be called
for each data chunk so that storing the entire file is possible.
The logger call in the API is as follows:
typedef int (*FiledataLogger)(ThreadVars *, void *thread_data,
const Packet *, const File *, const FileData *, uint8_t flags);
All data is const, thus should be read only. The final flags field
is used to indicate to the caller that the file is new, or if it's
being closed.
Files use an internal unique id 'file_id' which can be used by the
loggers to create unique file names. This id can use the 'waldo'
feature of the log-filestore module. This patch moves that waldo
loading and storing logic to this API's implementation. A new
configuration directive 'file-store-waldo: <filename>' is added,
but the existing waldo settings will also continue to work.
This patch introduces a new logging API for logging extracted file info.
It allows for registration of a callback that is called once per file:
when it's considered 'closed'.
Users of this API register their Log Function through:
OutputRegisterFileModule()
The API uses a magic settings globally. This might be changed later.
This patch introduces a new API for logging transactions from
tx-aware app layer protocols. It runs all the registered loggers
from a single thread module. This thread module takes care of the
transaction handling and flow locking. The logger just gets a
transaction to log out.
All loggers for a protocol will be run at the same time, so there
will not be any timing differences.
Loggers will no longer act as Thread Modules in the strictest sense.
The Func is NULL, and SetupOuputs no longer attaches them to the
thread module chain individually. Instead, after registering through
OutputRegisterTxModule, the setup data is used in the single logging
module.
The logger (LogFunc) is called for each transaction once, at the end
of the transaction.
This patch introduces a new API for outputs that log based on the
packet, such as alert outputs. In converts fast-log to the new API.
The API gets rid of the concept of each logger being a thread module,
but instead there is one thread module that runs all packet loggers.
Through the registration function OutputRegisterPacketModule a log
module can register itself to be considered for each packet.
Each logger registers itself to this new API with 2 functions and the
OutputCtx object that was already used in the old implementation.
The function pointers are:
LogFunc: the log function
ConditionFunc: this function is called before the LogFunc and only
if this returns TRUE the LogFunc is called.
For a simple alert logger like fast-log, the condition function will
simply return TRUE if p->alerts.cnt > 0.
Move app layer event handling into app-layer-event.[ch].
Convert 'Set' macro's to functions.
Get rid of duplication in Set and SetRaw. Set now calls SetRaw.
Fix potentential int overflow condition in the event storage.
Update callers.
This patch introduces wrapper functions around allocation functions
to be able to have a global HTP memcap. A simple subsitution of
function was not enough because allocated size needed to be known
during freeing and reallocation.
The value of the memcap can be set in the YAML and is left by default
to unlimited (0) to avoid any surprise to users.
Split threads.h into several files, where each of these files defines
all lock types and macro's.
threads.h defines the normal case
threads-debug.h defines the debug variants
threads-profile.h defines the lock profiling variants
Finally, threads-arch-tile.h moves the Tilera specifics out
Move SIMD the implementations of SigMatchSignaturesBuildMatchArray()
for SSE3 and Tile out of detect.c to reduce the size of the file.
Also moved SIMD unit tests to detect-simd.c
Aho-Corasick mpm optimized for Tilera Tile-Gx architecture. Based on the
util-mpm-ac.c code base. The primary optimizations are:
1) Matching function used Tilera specific instructions.
2) Alphabet compression to reduce delta table size to increase cache
utilization and performance.
The basic observation is that not all 256 ASCII characters are used by
the set of multiple patterns in a group for which a DFA is
created. The first reason is that Suricata's pattern matching is
case-insensitive, so all uppercase characters are converted to
lowercase, leaving a hole of 26 characters in the
alphabet. Previously, this hole was simply left in the middle of the
alphabet and thus in the generated Next State (delta) tables.
A new, smaller, alphabet is created using a translation table of 256
bytes per mpm group. Previously, there was one global translation
table for converting upper case to lowercase.
Additional, unused characters are found by creating a histogram of all
the characters in all the patterns. Then all the characters with zero
counts are mapped to one character (0) in the new alphabet. Since
These characters appear in no pattern, they can all be mapped to a
single character and still result in the same matches being
found. Zero was chosen for the value in the new alphabet since this
"character" is more likely to appear in the input. The unused
character always results in the next state being state zero, but that
fact is not currently used by the code, since special casing takes
additional instructions.
The characters that do appear in some pattern are mapped to
consecutive characters in the new alphabet, starting at 1. This
results in a dense packing of next state values in the delta tables
and additionally can allow for a smaller number of columns in that
table, thus using less memory and better packing into the cache. The
size of the new alphabet is the number of used characters plus 1 for
the unused catch-all character.
The alphabet size is rounded up to the next larger power-of-2 so that
multiplication by the alphabet size can be done with a shift. It
might be possible to use a multiply instruction, so that the exact
alphabet size could be used, which would further reduce the size of
the delta tables, increase cache density and not require the
specialized search functions. The multiply would likely add 1 cycle to
the inner search loop.
Since the multiply by alphabet-size is cleverly merged with a mask
instruction (in the SINDEX macro), specialized versions of the
SCACSearch function are generated for alphabet sizes 256, 128, 64, 32
and 16. This is done by including the file util-mpm-ac-small.c
multiple times with a redefined SINDEX macro. A function pointer is
then stored in the mpm context for the search function. For alpha bit
sizes of 8 or smaller, the number of states usually small, so the DFA
is already very small, so there is little difference using the 16
state search function.
The SCACSearch function is also specialized by the size of the value
stored in the next state (delta) tables, either 16-bits or 32-bits.
This removes a conditional inside the Search function. That
conditional is only called once, but doesn't hurt to remove
it. 16-bits are used for up to 32K states, with the sign bit set for
states with matches.
Future optimization:
The state-has-match values is only needed per state, not per next
state, so checking the next-state sign bit could be replaced with
reading a different value, at the cost of an additional load, but
increasing the 16-bit next state span to 64K.
Since the order of the characters in the new alphabet doesn't matter,
the new alphabet could be sorted by the frequency of the characters in
the expected input stream for that multi-pattern matcher. This would
group more frequent characters into the same cache lines, thus
increasing the probability of reusing a cache-line.
All the next state values for each state live in their own set of
cache-lines. With power-of-two sizes alphabets, these don't overlap.
So either 32 or 16 character's next states are loaded in each cache
line load. If the alphabet size is not an exact power-of-2, then the
last cache-line is not completely full and up to 31*2 bytes of that
line could be wasted per state.
The next state table could be transposed, so that all the next states
for a specific character are stored sequentially, this could be better
if some characters, for example the unused character, are much more
frequent.
This patch is using the qa/log directory to store the output
of the check. In case of success, the directory is deleted.
In case of failure, the directory remains in place.
This should fixes#910.
The list-keyword and app-layer listing code was spread over all the
init code. This patch introduces a separate file to store non standard
running mode like these ones.
The TILE-Gx processor includes a packet processing engine, called
mPIPE, that can deliver packets directly into user space memory. It
handles buffer allocation and load balancing (either static 5-tuple
hashing, or dynamic flow affinity hashing are used here). The new
packet source code is in source-mpipe.c and source-mpipe.h
A new Tile runmode is added that configures the Suricata pipelines in
worker mode, where each thread does the entire packet processing
pipeline. It scales across all the Gx chips sizes of 9, 16, 36 or 72
cores. The new runmode is in runmode-tile.c and runmode-tile.h
The configure script detects the TILE-Gx architecture and defines
HAVE_MPIPE, which is then used to conditionally enable the code to
support mPIPE packet processing. Suricata runs on TILE-Gx even without
mPIPE support enabled.
The Suricata Packet structures are allocated by the mPIPE hardware by
allocating the Suricata Packet structure immediatley before the mPIPE
packet buffer and then pushing the mPIPE packet buffer pointer onto
the mPIPE buffer stack. This way, mPIPE writes the packet data into
the buffer, returns the mPIPE packet buffer pointer, which is then
converted into a Suricata Packet pointer for processing inside
Suricata. When the Packet is freed, the buffer is returned to mPIPE's
buffer stack, by setting ReleasePacket to an mPIPE release specific
function.
The code checks for the largest Huge page available in Linux when
Suricata is started. TILE-Gx supports Huge pages sizes of 16MB, 64MB,
256MB, 1GB and 4GB. Suricata then divides one of those page into
packet buffers for mPIPE.
The code is not yet optimized for high performance. Performance
improvements will follow shortly.
The code was originally written by Tom Decanio and then further
modified by Tilera.
This code has been tested with Tilera's Multicore Developement
Environment (MDE) version 4.1.5. The TILEncore-Gx36 (PCIe card) and
TILEmpower-Gx (1U Rack mount).
In preparation of the libhtp upgrade, move all libhtp related conditionals
to configure. This allows for one set of build scripts that works regardless
of the presence of a local libhtp dir.