Start of the traffic project

So, I managed to gather about 1 GB of records from the pfsense installation and grab them from the box (filter.log files that you can find under /var/log).

And I have a list of 16 logs that I need to concatenate.

I had a lot of trouble concatenating it since I tried multiple times to use writelines() method from the file object.

The code that worked for me:

outputcsv = open('//Users//tudorsorin//Downloads//var//log//firewallrepo//filter.csv','w')
f = open(f'//Users//tudorsorin//Downloads//var//log//firewallrepo//filter.concat', 'r')
lines = f.readlines()
for line in lines:
    outputcsv.writelines(",".join(line.split(" ")[0:3])+","+line.split(" ")[-1])
f.close()
outputcsv.close()

The idea is that it’s already in CSV format and all you need to do is to modify the “header” that normally looks like Feb 20 07:58:18 soaretudorhome filterlog[41546]: to something like Feb,20, 07:58:18, and the rest remains the same.

Suprisingly, if you want to load it directly to a dataframe using pd.read_csv and you don’t force a header it works and I have all the data there with NaN in the fields that are not filled.

After this is done, we can filter only traffic that is done over ppoe0 which is the WAN interface, and you can easily do that using temp = df[df[‘pppoe0’] == ‘pppoe0’]

So far so good. I also took a look at a generic pppoe0 line and came to the conclusion that the colums that interest me are [0,1,2,21,22,23,24] which represent the date and source ip, destination ip and ports (source and destination). You can than filter the dataframe by temp = temp.iloc[:, [0,1,2,21, 22, 23, 24]]

So we finally we have a dateframe that we can work with. Now what remains is to change the table header and try to enhance it with extra info.

Cheers

Sorin