Month: January 2023

  • Loading unique IP’s in MongoDB

    Hi,

    So today I played a little bit with the possibility of storing the unique IP addresses in a separate table.

    Since I will use a subscription from  ip-api.com, it seems that there is an option to query info by batch processing with a limit of 100 IP’s per payload.

    So, at a first glance there are 227200 unique ip’s in my dataset. That will account for 2272 payloads to be queried.

    The code more or less looks in the following way:

    unique_ip = temp['SourceIP'].unique()
    unique_list = [unique_ip[i:i + 100] for i in range(0, len(unique_ip), 100)]
    data = []
    for i in range(len(unique_list)):
        temp_dict = {}
        temp_dict['id'] = i+1
        temp_dict['payload'] = unique_list[i].tolist()
        data.append(temp_dict)

    Once this is constructed you only need to parse the list element by element and insert it to MongoDB using this code:

    import pymongo
    
    myclient = pymongo.MongoClient("mongodb://localhost:27017/")
    mydb = myclient["mydatabase"]
    mycol = mydb["unique_ip"]
    for i in range(len(data)): 
        mycol.insert_one(data[i])

    Next step will involve taking the collection one by one and serve it to the API endpoint.

    Tnx,

    Sorin

  • Loading data to a Mongo database for further processing

    Hi,

    Since I needed my data to be available for further processing in a centralized manner, I have decided to store the first draft as well as further queries to location API in a Mongo database.

    Here is the short code snippet that was used for this task:

    import pandas as pd
    
    import pymongo
    
    df = pd.read_csv(r'C://Users//Sorin//Downloads//filter.concat')
    
    test = df[df['pppoe0'] == 'pppoe0']
    temp = test.iloc[:, [0,1,2,21, 22, 23, 24]]
    
    dict = {'Feb': 'Month',
            '18': 'Day',
            '09:16:00': 'Hour',
            '184.105.247.254': 'SourceIP',
            '86.123.204.222': 'DestinationIP',
            '48307': 'SourcePort',
            '447': 'DestinationPort'}
     
    temp.rename(columns=dict,
              inplace=True)
    
    
    myclient = pymongo.MongoClient("mongodb://localhost:27017/")
    mydb = myclient["mydatabase"]
    mycol = mydb["traffic_init_load"]
    temp.reset_index(inplace=True)
    data_dict = temp.to_dict("records")
    # Insert collection
    mycol.insert_many(data_dict) 

    From the concatenated file, my interest is strictly related to traffic on pppoe.

    We take only headers which are related to Source and Destination, and after that the documents are written to MongoDB.

    That is all.