mirror of https://github.com/OISF/suricata
doc: add performance analysis section
parent
fef124b92d
commit
1d9db2b5f9
@ -0,0 +1,186 @@
|
|||||||
|
Performance Analysis
|
||||||
|
=====================
|
||||||
|
|
||||||
|
There are many possibilities that could be the reason for performance issues.
|
||||||
|
In this section we will guide you through some options. The first part will
|
||||||
|
cover basic steps and introduce some helpful tools. The second part will cover
|
||||||
|
more in-depth explanations and corner cases.
|
||||||
|
|
||||||
|
System Load
|
||||||
|
-----------
|
||||||
|
|
||||||
|
The first step should be to check the system load. Run a top tool like **htop**
|
||||||
|
to get an overview of the system load and if there is a bottleneck with the
|
||||||
|
traffic distribution. For example if you can see that only a small number of
|
||||||
|
cpu cores hit 100% all the time and others don't, it could be related to a bad
|
||||||
|
traffic distribution or elephant flows like in the screenshot where one core
|
||||||
|
peaks due to one big elephant flow.
|
||||||
|
|
||||||
|
.. image:: analysis/htopelephantflow.png
|
||||||
|
|
||||||
|
If all cores are at peak load the system might be too slow for the traffic load
|
||||||
|
or misconfigured. Also keep an eye on memory usage, if the actual memory usage
|
||||||
|
is too high and the system needs to swap it will also result in low
|
||||||
|
performance.
|
||||||
|
|
||||||
|
The load will give you a first indication where to start with the debugging at
|
||||||
|
specific parts we describe in more detail in the second part.
|
||||||
|
|
||||||
|
Logfiles
|
||||||
|
--------
|
||||||
|
|
||||||
|
The next step would be to check all the log files with a focus on **stats.log**
|
||||||
|
and **suricata.log** if any obvious issues are seen. The most obvious indicator
|
||||||
|
is the **capture.kernel_drops** value that ideally would not even show up but
|
||||||
|
should be below 1% of the **capture.kernel_packets** value as high drop rates
|
||||||
|
could lead to a reduced amount of events and alerts.
|
||||||
|
|
||||||
|
If **memcap** is seen in the stats the memcap values in the configuration could
|
||||||
|
be increased. This can result to higher memory usage and should be taken into
|
||||||
|
account when the settings are changed.
|
||||||
|
|
||||||
|
Don't forget to check any system logs as well, even a **dmesg** run can show
|
||||||
|
potential issues.
|
||||||
|
|
||||||
|
Suricata Load
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Besides the system load, another indicator for potential performance issues is
|
||||||
|
the load of Suricata itself. A helpful tool for that is **perf** which helps
|
||||||
|
to spot performance issues. Make sure you have it installed and also the debug
|
||||||
|
symbols installed for Suricata or the output won't be very helpful. This output
|
||||||
|
is also helpful when you report performance issues as the Suricata Development
|
||||||
|
team can narrow down possible issues with that.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
sudo perf top -p $(pidof suricata)
|
||||||
|
|
||||||
|
If you see specific function calls at the top in red it's a hint that those are
|
||||||
|
the bottlenecks. For example if you see **IPOnlyMatchPacket** it can be either
|
||||||
|
a result of high drop rates or incomplete flows which result in decreased
|
||||||
|
performance. To look into the performance issues on a specific thread you can
|
||||||
|
pass **-t TID** to perf top. In other cases you can see functions that give you
|
||||||
|
a hint that a specific protocol parser is used a lot and can either try to
|
||||||
|
debug a performance bug or try to filter related traffic.
|
||||||
|
|
||||||
|
.. image:: analysis/perftop.png
|
||||||
|
|
||||||
|
In general try to play around with the different configuration options that
|
||||||
|
Suricata does provide with a focus on the options described in
|
||||||
|
:doc:`high-performance-config`.
|
||||||
|
|
||||||
|
Traffic
|
||||||
|
-------
|
||||||
|
|
||||||
|
In most cases where the hardware is fast enough to handle the traffic but the
|
||||||
|
drop rate is still high it's related to specific traffic issues.
|
||||||
|
|
||||||
|
Basics
|
||||||
|
^^^^^^
|
||||||
|
|
||||||
|
Some of the basic checks are:
|
||||||
|
|
||||||
|
- Check if the traffic is bidirectional, if it's mostly unidirectional you're
|
||||||
|
missing relevant parts of the flow (see **tshark** example at the bottom).
|
||||||
|
Another indicator could be a big discrepancy between SYN and SYN-ACK as well
|
||||||
|
as RST counter in the Suricata stats.
|
||||||
|
|
||||||
|
- Check for encapsulated traffic, while GRE, MPLS etc. are supported they could
|
||||||
|
also lead to performance issues. Especially if there are several layers of
|
||||||
|
encapsulation.
|
||||||
|
|
||||||
|
- Use tools like **iftop** to spot elephant flows. Flows that have a rate of
|
||||||
|
over 1Gbit/s for a long time can result in one cpu core peak at 100% all the
|
||||||
|
time and increasing the droprate while it doesn't make sense to dig deep into
|
||||||
|
this traffic.
|
||||||
|
|
||||||
|
- Another approach to narrow down issues is the usage of **bpf filter**. For
|
||||||
|
example filter all HTTPS traffic with **not port 443** to exclude traffic
|
||||||
|
that might be problematic or just look into one specific port **port 25** if
|
||||||
|
you expect some issues with a specific protocol. See :doc:`ignoring-traffic`
|
||||||
|
for more details.
|
||||||
|
|
||||||
|
- If VLAN is used it might help to disable **vlan.use-for-tracking** in
|
||||||
|
scenarios where only one direction of the flow has the VLAN tag.
|
||||||
|
|
||||||
|
Advanced
|
||||||
|
^^^^^^^^
|
||||||
|
|
||||||
|
There are several advanced steps and corner cases when it comes to a deep dive
|
||||||
|
into the traffic.
|
||||||
|
|
||||||
|
If VLAN QinQ (IEEE 802.1ad) is used be very cautious if you use **cluster_qm**
|
||||||
|
in combinatin with Intel drivers and AF_PACKET runmode. While the RFC expects
|
||||||
|
ethertype 0x8100 and 0x88A8 in this case (see
|
||||||
|
https://en.wikipedia.org/wiki/IEEE_802.1ad) most implementations only add
|
||||||
|
0x8100 on each layer. If the first seen layer has the same VLAN tag but the
|
||||||
|
inner one has different VLAN tags it will still end up in the same queue in
|
||||||
|
**cluster_qm** mode. This was observed with the i40e driver up to 2.8.20 and
|
||||||
|
the firmare version up to 7.00, feel free to report if newer versions have
|
||||||
|
fixed this (see https://suricata-ids.org/support/).
|
||||||
|
|
||||||
|
|
||||||
|
If you want to use **tshark** to get an overview of the traffic direction use
|
||||||
|
this command:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
sudo tshark -i $INTERFACE -q -z conv,ip -a duration:10
|
||||||
|
|
||||||
|
The output will show you all flows within 10s and if you see 0 for one
|
||||||
|
direction you have unidirectional traffic, thus you don't see the ACK packets
|
||||||
|
for example. Since Suricata is trying to work on flows this will have a rather
|
||||||
|
big impact on the visibility. Focus on fixing the unidirectional traffic. If
|
||||||
|
it's not possible at all you can enable **async-oneside** in the **stream**
|
||||||
|
configuration setting.
|
||||||
|
|
||||||
|
Check for other unusual or complex protocols that aren't supported very well.
|
||||||
|
You can try to filter those to see if it has any impact on the performance. In
|
||||||
|
this example we filter Cisco Fabric Path (ethertype 0x8903) with the bpf filter
|
||||||
|
**not ether proto 0x8903** as it's assumed to be a performance issue (see
|
||||||
|
https://redmine.openinfosecfoundation.org/issues/3637)
|
||||||
|
|
||||||
|
Elephant Flows
|
||||||
|
^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
The so called Elephant Flows or traffic spikes are quite difficult to deal
|
||||||
|
with. In most cases those are big file transfers or backup traffic and it's not
|
||||||
|
feasible to decode the whole traffic. From a network security monitoring
|
||||||
|
perspective it's enough to log the metadata of that flow and do a packet
|
||||||
|
inspection at the beginning but not the whole flow.
|
||||||
|
|
||||||
|
If you can spot specific flows as described above then try to filter those. The
|
||||||
|
easiest solution would be a bpf filter but that would still result in a
|
||||||
|
performance impact. Ideally you can filter such traffic even sooner on driver
|
||||||
|
or NIC level (see eBPF/XDP) or even before it reaches the system where Suricata
|
||||||
|
is running. Some commercial packet broker support such filtering where it's
|
||||||
|
called **Flow Shunting** or **Flow Slicing**.
|
||||||
|
|
||||||
|
Rules
|
||||||
|
-----
|
||||||
|
|
||||||
|
The Ruleset plays an important role in the detection but also in the
|
||||||
|
performance capability of Suricata. Thus it's recommended to look into the
|
||||||
|
impact of enabled rules as well.
|
||||||
|
|
||||||
|
If you run into performance issues and struggle to narrow it down start with
|
||||||
|
running Suricata without any rules enabled and use the tools again that have
|
||||||
|
been explained at the first part. Keep in mind that even without signatures
|
||||||
|
enabled Suricata still does all the decoding and traffic analysis, so a fair
|
||||||
|
amount of load should still be seen. If the load is still very high and drops
|
||||||
|
are seen and the hardware should be capable to deal with such traffic loads you
|
||||||
|
should deep dive if there is any specific traffic issue (see above) or report
|
||||||
|
the performance issue so it can be investigated (see
|
||||||
|
https://suricata-ids.org/support/).
|
||||||
|
|
||||||
|
Suricata also provides several specific traffic related signatures in the rules
|
||||||
|
folder that could be enabled for testing to spot specific traffic issues. Those
|
||||||
|
are found the **rules** and you should start with **decoder-events.rules**,
|
||||||
|
**stream-events.rules** and **app-layer-events.rules**.
|
||||||
|
|
||||||
|
It can also be helpful to use :doc:`rule-profiling` and/or
|
||||||
|
:doc:`packet-profiling` to find problematic rules or traffic pattern. This is
|
||||||
|
achieved by compiling Suricata with **--enable-profiling** but keep in mind
|
||||||
|
that this has an impact on performance and should only be used for
|
||||||
|
troubleshooting.
|
Binary file not shown.
After Width: | Height: | Size: 35 KiB |
Binary file not shown.
After Width: | Height: | Size: 36 KiB |
Loading…
Reference in New Issue