mirror of https://github.com/OISF/suricata
doc: add performance analysis section
parent
fef124b92d
commit
1d9db2b5f9
@ -0,0 +1,186 @@
|
||||
Performance Analysis
|
||||
=====================
|
||||
|
||||
There are many possibilities that could be the reason for performance issues.
|
||||
In this section we will guide you through some options. The first part will
|
||||
cover basic steps and introduce some helpful tools. The second part will cover
|
||||
more in-depth explanations and corner cases.
|
||||
|
||||
System Load
|
||||
-----------
|
||||
|
||||
The first step should be to check the system load. Run a top tool like **htop**
|
||||
to get an overview of the system load and if there is a bottleneck with the
|
||||
traffic distribution. For example if you can see that only a small number of
|
||||
cpu cores hit 100% all the time and others don't, it could be related to a bad
|
||||
traffic distribution or elephant flows like in the screenshot where one core
|
||||
peaks due to one big elephant flow.
|
||||
|
||||
.. image:: analysis/htopelephantflow.png
|
||||
|
||||
If all cores are at peak load the system might be too slow for the traffic load
|
||||
or misconfigured. Also keep an eye on memory usage, if the actual memory usage
|
||||
is too high and the system needs to swap it will also result in low
|
||||
performance.
|
||||
|
||||
The load will give you a first indication where to start with the debugging at
|
||||
specific parts we describe in more detail in the second part.
|
||||
|
||||
Logfiles
|
||||
--------
|
||||
|
||||
The next step would be to check all the log files with a focus on **stats.log**
|
||||
and **suricata.log** if any obvious issues are seen. The most obvious indicator
|
||||
is the **capture.kernel_drops** value that ideally would not even show up but
|
||||
should be below 1% of the **capture.kernel_packets** value as high drop rates
|
||||
could lead to a reduced amount of events and alerts.
|
||||
|
||||
If **memcap** is seen in the stats the memcap values in the configuration could
|
||||
be increased. This can result to higher memory usage and should be taken into
|
||||
account when the settings are changed.
|
||||
|
||||
Don't forget to check any system logs as well, even a **dmesg** run can show
|
||||
potential issues.
|
||||
|
||||
Suricata Load
|
||||
-------------
|
||||
|
||||
Besides the system load, another indicator for potential performance issues is
|
||||
the load of Suricata itself. A helpful tool for that is **perf** which helps
|
||||
to spot performance issues. Make sure you have it installed and also the debug
|
||||
symbols installed for Suricata or the output won't be very helpful. This output
|
||||
is also helpful when you report performance issues as the Suricata Development
|
||||
team can narrow down possible issues with that.
|
||||
|
||||
::
|
||||
|
||||
sudo perf top -p $(pidof suricata)
|
||||
|
||||
If you see specific function calls at the top in red it's a hint that those are
|
||||
the bottlenecks. For example if you see **IPOnlyMatchPacket** it can be either
|
||||
a result of high drop rates or incomplete flows which result in decreased
|
||||
performance. To look into the performance issues on a specific thread you can
|
||||
pass **-t TID** to perf top. In other cases you can see functions that give you
|
||||
a hint that a specific protocol parser is used a lot and can either try to
|
||||
debug a performance bug or try to filter related traffic.
|
||||
|
||||
.. image:: analysis/perftop.png
|
||||
|
||||
In general try to play around with the different configuration options that
|
||||
Suricata does provide with a focus on the options described in
|
||||
:doc:`high-performance-config`.
|
||||
|
||||
Traffic
|
||||
-------
|
||||
|
||||
In most cases where the hardware is fast enough to handle the traffic but the
|
||||
drop rate is still high it's related to specific traffic issues.
|
||||
|
||||
Basics
|
||||
^^^^^^
|
||||
|
||||
Some of the basic checks are:
|
||||
|
||||
- Check if the traffic is bidirectional, if it's mostly unidirectional you're
|
||||
missing relevant parts of the flow (see **tshark** example at the bottom).
|
||||
Another indicator could be a big discrepancy between SYN and SYN-ACK as well
|
||||
as RST counter in the Suricata stats.
|
||||
|
||||
- Check for encapsulated traffic, while GRE, MPLS etc. are supported they could
|
||||
also lead to performance issues. Especially if there are several layers of
|
||||
encapsulation.
|
||||
|
||||
- Use tools like **iftop** to spot elephant flows. Flows that have a rate of
|
||||
over 1Gbit/s for a long time can result in one cpu core peak at 100% all the
|
||||
time and increasing the droprate while it doesn't make sense to dig deep into
|
||||
this traffic.
|
||||
|
||||
- Another approach to narrow down issues is the usage of **bpf filter**. For
|
||||
example filter all HTTPS traffic with **not port 443** to exclude traffic
|
||||
that might be problematic or just look into one specific port **port 25** if
|
||||
you expect some issues with a specific protocol. See :doc:`ignoring-traffic`
|
||||
for more details.
|
||||
|
||||
- If VLAN is used it might help to disable **vlan.use-for-tracking** in
|
||||
scenarios where only one direction of the flow has the VLAN tag.
|
||||
|
||||
Advanced
|
||||
^^^^^^^^
|
||||
|
||||
There are several advanced steps and corner cases when it comes to a deep dive
|
||||
into the traffic.
|
||||
|
||||
If VLAN QinQ (IEEE 802.1ad) is used be very cautious if you use **cluster_qm**
|
||||
in combinatin with Intel drivers and AF_PACKET runmode. While the RFC expects
|
||||
ethertype 0x8100 and 0x88A8 in this case (see
|
||||
https://en.wikipedia.org/wiki/IEEE_802.1ad) most implementations only add
|
||||
0x8100 on each layer. If the first seen layer has the same VLAN tag but the
|
||||
inner one has different VLAN tags it will still end up in the same queue in
|
||||
**cluster_qm** mode. This was observed with the i40e driver up to 2.8.20 and
|
||||
the firmare version up to 7.00, feel free to report if newer versions have
|
||||
fixed this (see https://suricata-ids.org/support/).
|
||||
|
||||
|
||||
If you want to use **tshark** to get an overview of the traffic direction use
|
||||
this command:
|
||||
|
||||
::
|
||||
|
||||
sudo tshark -i $INTERFACE -q -z conv,ip -a duration:10
|
||||
|
||||
The output will show you all flows within 10s and if you see 0 for one
|
||||
direction you have unidirectional traffic, thus you don't see the ACK packets
|
||||
for example. Since Suricata is trying to work on flows this will have a rather
|
||||
big impact on the visibility. Focus on fixing the unidirectional traffic. If
|
||||
it's not possible at all you can enable **async-oneside** in the **stream**
|
||||
configuration setting.
|
||||
|
||||
Check for other unusual or complex protocols that aren't supported very well.
|
||||
You can try to filter those to see if it has any impact on the performance. In
|
||||
this example we filter Cisco Fabric Path (ethertype 0x8903) with the bpf filter
|
||||
**not ether proto 0x8903** as it's assumed to be a performance issue (see
|
||||
https://redmine.openinfosecfoundation.org/issues/3637)
|
||||
|
||||
Elephant Flows
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
The so called Elephant Flows or traffic spikes are quite difficult to deal
|
||||
with. In most cases those are big file transfers or backup traffic and it's not
|
||||
feasible to decode the whole traffic. From a network security monitoring
|
||||
perspective it's enough to log the metadata of that flow and do a packet
|
||||
inspection at the beginning but not the whole flow.
|
||||
|
||||
If you can spot specific flows as described above then try to filter those. The
|
||||
easiest solution would be a bpf filter but that would still result in a
|
||||
performance impact. Ideally you can filter such traffic even sooner on driver
|
||||
or NIC level (see eBPF/XDP) or even before it reaches the system where Suricata
|
||||
is running. Some commercial packet broker support such filtering where it's
|
||||
called **Flow Shunting** or **Flow Slicing**.
|
||||
|
||||
Rules
|
||||
-----
|
||||
|
||||
The Ruleset plays an important role in the detection but also in the
|
||||
performance capability of Suricata. Thus it's recommended to look into the
|
||||
impact of enabled rules as well.
|
||||
|
||||
If you run into performance issues and struggle to narrow it down start with
|
||||
running Suricata without any rules enabled and use the tools again that have
|
||||
been explained at the first part. Keep in mind that even without signatures
|
||||
enabled Suricata still does all the decoding and traffic analysis, so a fair
|
||||
amount of load should still be seen. If the load is still very high and drops
|
||||
are seen and the hardware should be capable to deal with such traffic loads you
|
||||
should deep dive if there is any specific traffic issue (see above) or report
|
||||
the performance issue so it can be investigated (see
|
||||
https://suricata-ids.org/support/).
|
||||
|
||||
Suricata also provides several specific traffic related signatures in the rules
|
||||
folder that could be enabled for testing to spot specific traffic issues. Those
|
||||
are found the **rules** and you should start with **decoder-events.rules**,
|
||||
**stream-events.rules** and **app-layer-events.rules**.
|
||||
|
||||
It can also be helpful to use :doc:`rule-profiling` and/or
|
||||
:doc:`packet-profiling` to find problematic rules or traffic pattern. This is
|
||||
achieved by compiling Suricata with **--enable-profiling** but keep in mind
|
||||
that this has an impact on performance and should only be used for
|
||||
troubleshooting.
|
Binary file not shown.
After Width: | Height: | Size: 35 KiB |
Binary file not shown.
After Width: | Height: | Size: 36 KiB |
Loading…
Reference in New Issue