Your Data Zen starts here.

Botnet traffic over SSL

Another great example of contribution of big data analysis to IT Security operations. Data were collected by using a free proxy, anonymized automatically during the collection.


Encrypted traffic towards huge number of WordPress based websites with big prevalence of DNS communication collected as list of anonymized flows.


All recognized WordPress sites were listed for crawling to collect domain registration information.

This crawling exercise revealed that domains were registered in batches of similar time ranges.

From this data, “location” information was extracted. Additionally, this information was linked to unique URI of each WordPress site considered as a relation from data point of view.

In case that connection reoccurred more than once, multiplicities were ignored to improve readability of a final picture.

Below is raw data, yet certain patterns is already obvious:4-29-2016 10-41-28 PM

  • There are three main affected locations. {Rounded areas on top of each peak}
  • Each location made huge number of connections to unique WordPress domain. {These rounded areas are actually domains linked to main locations}
  • Certain WordPress domains were reached from more than one locations, some of them were queried from all locations. {Pattern where two different colours meet at one group of nodes on the picture}
  • By analysis of DNS requests it was found that certain domains are named using specific pattern while the rest of the names was random. These names were added to the “word list” for future patterns matching.
  • Previously mentioned behavior is a typical for so called DGA – based malware.
  • Botnet was identified by correlation of the traffic related to affected hosts. In this case, should the creator randomized the time span between the command and the answer of the machine, it would be close to impossible to detect the botnet activity. {This was separate analysis, not included to this article. Process was similar to which is used by the Stratosphere IPS}


Using big-data analysis and open source tools it was possible to identify the threat and to report the suspicion which hopefully helped to block part of the Botnet.

Do you have a dataset you would like to analyze for a similar behavior? Reach me and I am happy to help or advice.


Following pictures are added only for illustration, to please security data science – geeks like myself.

A detail of central component:

4-29-2016 10-27-32 PM

This picture shows first grouped view on raw data. Central “star-like” component suggests symmetric relation within single class of nodes. Community detection was used for coloring.


Overview of complete structure´s reconstruction of the botnet.


Different coloring highlights also central component´s structure:


Additional visualizations:




4-29-2016-10-15-27-pm 4-29-2016-10-18-28-pm 4-29-2016-10-27-32-pm 4-29-2016-10-30-43-pm 4-29-2016-10-31-36-pm

4-29-2016-10-12-44-pm 4-29-2016-10-17-10-pm 4-29-2016-10-19-35-pm

Next Post

Previous Post

© 2020 4n6strider

Theme by Anders Norén