Today, most open source network flow tools lack a flexible and easy to use interface. Using Logstash’s built-in netflow codec, Kibana’s great looking and powerful web interface, and the flexibility of Elastic, you can build a tool that rivals commercial flow-collecting products.

 

Kibana – Analyzing data

Discover

Discover is the main screen in Kibana and a good place to test and build queries. You can customize the columns and time displayed, look at the top n of each field in the left column, and open the flow record to see its complete details. From here you can save searches or create queries and pin them to use later in visualizations.

discover

Visualize

There are many ways to present your data in Kibana. The top three for us are line charts, pie charts, and map views. There are a few few other visualizations as well.

Line charts are great for displaying bytes per second (bps) or packets per second (pps) over time. The line graph below shows total bps for the matched country (Poland) over the past 5 days. When you choose per-second aggregation, Kibana may change your aggregate values, but it still does the math to give you the correct per second value.

Pie charts are well suited for showing data relative to the entire result set. There can even be multilayered data on the chart. The inner layer of the pie chart below shows relative traffic per country. The outer layer represents the different WordPress.com data centers where the traffic was received.

The map is useful to see how anycast routing is performing. In the map below, the query matches Sydney data center for traffic over the past 24 hours. You can see a few spots in North America. In this case, the cause is international networks that don’t have correct geolocation info on their IP addresses. The map data is only as good as the geolocation source.

There are many other visualization options and the only limit is your imagination!

One real-world example of the visualizations working together is to identify networks that aren’t routing traffic to the nearest data center. In the past, we’ve found networks peering with foreign route servers and preferring those routes instead of using much closer data centers. Using the Kibana map, we can select a problematic geographical area and create a filter focused on just the specific sources. The query is then used in a pie chart, showing you the top source ASNs and some example source/destination IPs. Filtering further, you can create a line chart to determine how many Mbps could potentially move to another, closer, PoP. This is just one of nearly limitless use cases of collecting and analyzing your netflow data with ELK.

Dashboard

It is fun and useful to set up different views in your network by having multiple complimenting charts on your screen. We’ve setup a Denial of Service dashboard that tracks spikes in suspicious traffic, such as DNS, SYN floods, NTP, or any other UDP spikes.

ddos-dash

In this example there was a spike in UDP traffic.  We highlighted the area to zoom in on that period in time. Kibana then redraws the results of all of the charts in the dashboard for that time period which allows us to see details of the traffic spike. We see what routers and interfaces that received the majority of the traffic on the pie chart. The inner layer of the pie chart represents a router, the mid area represents attached interfaces to each router and the outer area of the pie chart represents UDP vs TCP on a given interface. We can see the blue, representing UDP, appear on the dashboard below.

dash-zoomed

Logstash – Collecting the data and help make it useful

Config

The configuration for Logstash can be as simple as just adding an input of the netflow codec and an output of Elastic. Logstash has many powerful filters that can be combined with conditional statements to add value and readability to your data. Here are a few examples of filters in action:

# Private ASN is actually AS2635
	if [netflow][dst_as] < "65535" and [netflow][dst_as] > "64511" {
		mutate {
			replace => { "[netflow][dst_as]" => "2635" }
		}
	}
# Multiply by sampling interval to calculate the total bytes and packets the flow represents
if [netflow][sampling_interval] > 0 {
	ruby {
		code => "event['netflow']['in_bytes'] = event['netflow']['in_bytes'] * event['netflow']['sampling_interval']"
		add_tag => [ "multiplied" ]
	}

	ruby {
		code => "event['netflow']['in_pkts'] = event['netflow']['in_pkts'] * event['netflow']['sampling_interval']"
	}
}

# add a bits field
if [netflow][in_bytes] {
	ruby {
	code => "event['netflow']['in_bits'] = event['netflow']['in_bytes'] * 8"
	}
}
#Protocol friendly naming
translate {
field => "[netflow][protocol]"
destination => "[netflow][protocol]"
override => "true"
dictionary => [ "6", "TCP", "17", "UDP", "1", "ICMP", "47", "GRE", "50", "ESP" ]
}
# get the datacenter from hostname
translate {
field => "host"
destination => "datacenter"
dictionary => [
"hostname", "citycode",
...]
}

Translate to add friendly names instead of all numbers, such as, tcp flags, tcp/udp ports, interface names and hostnames.

if [host] == "hostname" {

translate {
field => "[netflow][input_snmp]"
destination => "[netflow][interface_in]"
dictionary => [ "633", "xe-1/0/0" ]
add_field => {
direction => "inbound"
traffic_type => "transit"
provider => "ntt"
}
}

translate {
field => "[netflow][output_snmp]"
destination => "[netflow][interface_out]"
dictionary => [ "633", "xe-1/0/0"]
add_field => {
direction => "outbound"
traffic_type => "transit"
provider => "ntt"
}
}

The filters can be used to add the GeoIP data, ASN including network name which is useful for routers that don’t have full BGP table.

if "inbound" == [direction] {
geoip {
database => "/home/logstash/config/GeoLiteCity.dat"
source => "[netflow][ipv4_src_addr]"
target => "geoip_src"
}

CIDR to tell Logstash about prefixes that have special meanings.

# Tag Interesting traffic by prefix
cidr {
add_field => { "interesting" => true }
address => [ "%{[netflow][ipv4_src_addr]}", "%{[netflow][ipv4_dst_addr]}" ]
network => [ "100.0.0.0/24" ]
}

Sflow

Logstash does not have a native sflow collector, so we used Sflow tool to convert sflow to netflow.  Another option is to parse the text output of Sflow tool as a pipe input to Logstash and set up the fields using the grok filter.

The future

IPv6/IPFIX/NetflowV9

Kibana supports NetflowV9 which can include IPv6 addresses and send flow data more efficiently. There is a patch for IPFIX with support being added soon. There are still some things being worked out, for example the GeoIP database filter doesn’t work with IPv6. This creates some filtering limitations, especially if you don’t have a full routing table router and want to group traffic by ASN or location.

Alerting

This is a new system for us and we are still tweaking and getting used to the data, so alerting is not yet configured. Alerting is definitely an important part of network monitoring. Using flow data to trigger alerts can go well beyond the limitations of SNMP queries and traps.  Some have made their own scripts to query Elastic(search), match the response with a threshold value, and then alert if it’s exceed. There are even plugins for Nagios that can query Elastic, parse the response and trigger an alert if required. Elastic.co also has a paid tool called Watcher that can serve this purpose.

11 thoughts on “Open Source Flow Collecting with Elastic, Logstash, and Kibana

  1. Great post. This will help us out a lot with our netflows. I also really love your dashboards, do you think I could get those from you?

    1. Thank you. I don’t think the Dashboards can be shared between systems. I think we’d need to have identical indexes, visualizations, and custom netflow fields. My advice is to think of something where you will end up looking at multiple visualizations to find details, or you may even notice yourself going through a routine to collect data about an event.

      The first thought that comes to mind is DDOS, almost everyone is familiar with it as well. I like to know what protocol the attack uses, what port, what IP is being attacked and what routers/interfaces see it, and is it high pps or high bps.

      The visualizations I mostly used filters option for each item on the chart. The flow detection BPS/PPS use Sum of BPS/PPS as a Date Histogram per second, then split lines by filter.

      Each filter is traffic I’m interested in, like NTP(netflow.l4_dst_port: 123 OR netflow.l4_src_port: 123) or DNS(netflow.l4_dst_port: 53 OR netflow.l4_src_port: 53). If you wanted to see the TOP N of a certain field like IP address you could use the terms option and select IP address to graph the top ones.

      Hope this helps!

  2. Hey, I am able to parse the data and I see data in discover tab. However I am unable to create proper visualization. Can you please help me out by providing few sample of visualization like creating a graph which shows per/sec bandwidth used by a router?

    1. Hello,

      The image in the sample graph(https://developer.files.wordpress.com/2016/01/line-chart.png) is a good place to start. I intentionally showed the left hand column and the top bar for my filters to show the graph settings. For the Y axis I choose sum of bits(default for netflow is bytes, I shared a logstash config to create a bits field). For the X axis I choose Date Histogram and per second value. The date range in the top right should be greater than 1 hour, Elasticsearch doesn’t have the full concept of the time length of a flow it counts as 1 second even if it lasted longer. When you select a greater time range it helps average out the data and smooth out the output.

      Lastly, to make the graph more meaningful you can use lucene query in the top bar to filter the data available to the graph. For example if you set hostname for your router in logstash config, you can add “hostname: RouterA” to filter that. If you want to see the top N of a field, you can choose the split lines in the left column while setting up the visualization, select terms and then select the term you are interested in seeing the top N of(maybe hostname term to see traffic per router compared on the graph).

      Hope this helps!

      1. Hello,

        Thanks for your explaination. By the time you replied, I figured out that logstash cannot find exact data in seconds like if I want to find the realtime bandwidth utilization of an ISP. Can you tell me the logic how I can calculate the bits transferred per sec using the fields in netflow. Just give me an example with values and I will try to implement it. In case I succeed, I will share my piece of code with you as you code has helped me a lot.

  3. Chris, Thank you for your post! I recently set up netflow data flow to my ELK stack and found your examples insightful.

    To do so, I needed to setup ncapd as a service and install nfdump as a cronjob to convert the netflow into a simple parsable log that logstash picks up with the ‘file input’. Sadly, I couldn’t get the netflow input filter to work for the life of me and resorted to this manual/scheduled method instead. https://github.com/phaag/nfdump

    For alerting, I use elastalert (a python script that queries elasticsearch) and have had some success with it. Now I need to study my netflow data and find out what I want to be alerted on for that. https://elastalert.readthedocs.io/en/latest/

    Thanks again for your work and contribution!
    Cheers.

    1. Thanks!

      It sounds like you are familar with operating logstash, it sounds strange that the netflow codec didn’t work for you. I might suggest version of netflow since it only supports 2(v5 and v9). V9 is a bit delayed to start collecting the flow data, at first when I tried V9 I thought it actually wasn’t working.

      Elastalert is really neat it’s a very flexible customizable monitoring solution. I’ve been using it recently too!

  4. I had similar issues as these ELK stack administrators: https://discuss.elastic.co/t/logstash-and-elasticsearch-setup-not-working-for-netflow-logs/27698/2 – where certain templates/field mappings weren’t found and consequently wouldn’t parse any of the netflow data that was coming in. period.

    Editing and pointing to new definitions in the codec filter didn’t work either – so I settled for doing it the “hard” way – which seems to be working fine for now.

    My difficulty with elastalert is coming up with useful scenarios to alert on.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s