Hi,
Since I needed my data to be available for further processing in a centralized manner, I have decided to store the first draft as well as further queries to location API in a Mongo database.
Here is the short code snippet that was used for this task:
import pandas as pd
import pymongo
df = pd.read_csv(r'C://Users//Sorin//Downloads//filter.concat')
test = df[df['pppoe0'] == 'pppoe0']
temp = test.iloc[:, [0,1,2,21, 22, 23, 24]]
dict = {'Feb': 'Month',
'18': 'Day',
'09:16:00': 'Hour',
'184.105.247.254': 'SourceIP',
'86.123.204.222': 'DestinationIP',
'48307': 'SourcePort',
'447': 'DestinationPort'}
temp.rename(columns=dict,
inplace=True)
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["traffic_init_load"]
temp.reset_index(inplace=True)
data_dict = temp.to_dict("records")
# Insert collection
mycol.insert_many(data_dict)
From the concatenated file, my interest is strictly related to traffic on pppoe.
We take only headers which are related to Source and Destination, and after that the documents are written to MongoDB.
That is all.