Categories
newtools python

Loading unique IP’s in MongoDB

Hi,

So today I played a little bit with the possibility of storing the unique IP addresses in a separate table.

Since I will use a subscription from  ip-api.com, it seems that there is an option to query info by batch processing with a limit of 100 IP’s per payload.

So, at a first glance there are 227200 unique ip’s in my dataset. That will account for 2272 payloads to be queried.

The code more or less looks in the following way:

unique_ip = temp['SourceIP'].unique()
unique_list = [unique_ip[i:i + 100] for i in range(0, len(unique_ip), 100)]
data = []
for i in range(len(unique_list)):
    temp_dict = {}
    temp_dict['id'] = i+1
    temp_dict['payload'] = unique_list[i].tolist()
    data.append(temp_dict)

Once this is constructed you only need to parse the list element by element and insert it to MongoDB using this code:

import pymongo

myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["unique_ip"]
for i in range(len(data)): 
    mycol.insert_one(data[i])

Next step will involve taking the collection one by one and serve it to the API endpoint.

Tnx,

Sorin

Categories
newtools python

Loading data to a Mongo database for further processing

Hi,

Since I needed my data to be available for further processing in a centralized manner, I have decided to store the first draft as well as further queries to location API in a Mongo database.

Here is the short code snippet that was used for this task:

import pandas as pd

import pymongo

df = pd.read_csv(r'C://Users//Sorin//Downloads//filter.concat')

test = df[df['pppoe0'] == 'pppoe0']
temp = test.iloc[:, [0,1,2,21, 22, 23, 24]]

dict = {'Feb': 'Month',
        '18': 'Day',
        '09:16:00': 'Hour',
        '184.105.247.254': 'SourceIP',
        '86.123.204.222': 'DestinationIP',
        '48307': 'SourcePort',
        '447': 'DestinationPort'}
 
temp.rename(columns=dict,
          inplace=True)


myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["traffic_init_load"]
temp.reset_index(inplace=True)
data_dict = temp.to_dict("records")
# Insert collection
mycol.insert_many(data_dict) 

From the concatenated file, my interest is strictly related to traffic on pppoe.

We take only headers which are related to Source and Destination, and after that the documents are written to MongoDB.

That is all.

Categories
python

Start of the traffic project

So, I managed to gather about 1 GB of records from the pfsense installation and grab them from the box (filter.log files that you can find under /var/log).

And I have a list of 16 logs that I need to concatenate.

I had a lot of trouble concatenating it since I tried multiple times to use writelines() method from the file object.

The code that worked for me:

outputcsv = open('//Users//tudorsorin//Downloads//var//log//firewallrepo//filter.csv','w')
f = open(f'//Users//tudorsorin//Downloads//var//log//firewallrepo//filter.concat', 'r')
lines = f.readlines()
for line in lines:
    outputcsv.writelines(",".join(line.split(" ")[0:3])+","+line.split(" ")[-1])
f.close()
outputcsv.close()

The idea is that it’s already in CSV format and all you need to do is to modify the “header” that normally looks like Feb 20 07:58:18 soaretudorhome filterlog[41546]: to something like Feb,20, 07:58:18, and the rest remains the same.

Suprisingly, if you want to load it directly to a dataframe using pd.read_csv and you don’t force a header it works and I have all the data there with NaN in the fields that are not filled.

After this is done, we can filter only traffic that is done over ppoe0 which is the WAN interface, and you can easily do that using temp = df[df[‘pppoe0’] == ‘pppoe0’]

So far so good. I also took a look at a generic pppoe0 line and came to the conclusion that the colums that interest me are [0,1,2,21,22,23,24] which represent the date and source ip, destination ip and ports (source and destination). You can than filter the dataframe by temp = temp.iloc[:, [0,1,2,21, 22, 23, 24]]

So we finally we have a dateframe that we can work with. Now what remains is to change the table header and try to enhance it with extra info.

Cheers

Sorin

Categories
newtools

Microsoft Teams blocked by pfBlockerNG

Hi,

One short tip to remember. I’ve been struggling for a while now with the fact that pfBlockerNG was blocking my Teams connection for whatever reason.

I couldn’t understand what was the correct way to fix this until today. I should have known that there isn’t a range of IPs that can be whitelisted to make it work, and it’s related to the domain that was blocked.

This became evident today when I took a look at the Reports tab and Alerts subtab and filtered by interface

In order to fix it, you will need to go to DNSBL tab and expand TLD Exclusion List so that you can add the general domain that should be excluded.

You could also whitelist each subdomain but since we are talking Microsoft, I think this is easier.

The way this works, at least from what I understood, is that it will allow all of hostnames with the general domain and only block the ones that are specifically blacklisted.

That would be all for today,

Sorin

Categories
python

Python Kata on Codewars

Hi,

Since I pretty much broke the internet trying to solve the following “kata” with pieces of code, lets paste it also here cause it makes me proud.

Here is the link to the kata: https://www.codewars.com/kata/5977ef1f945d45158d00011f

And also here is my “solution” which took quite a long time to fix:

def sep_str(st): 
    # your code here
    test_list = [[letter for letter in element] for element in st.split()]
    for a in test_list:
        a.extend([""] * (max(map(len, test_list)) - len(a)))
    if test_list:    
        result = [[test_list[j][i] for j in range(len(test_list))] for i in range(len(test_list[0]))]
    else:
        result = []
    return result

That is all.

Cheers!

Categories
newtools

Traffic statistics – new project

Hi,

For some time I wanted to understand how the traffic on my networking is actually shaped.

To that purpose, at first I purchased a Synology router but it seems that it hasn’t that much traffic logging capabilities, so I kept it and put in front of it the following box.

It’s a cool toy but ultimately I wanted to have Pfsense installed on it and logging activated so that I can gather as much data as possible.

It’s now installed and hopefully it should be the start of some articles related to the data manipulation and also, maybe, some administration insights.

Tnx,

Sorin

Categories
linux

Enable time sync on Manjaro

So I wanted for a while to use and to learn Manjaro and I grabbed Cinnamon 21.1.0

Installation process is pretty straight forward, I setup the correct time zone and installed all of the default packages.

Guess what, after rebooting the laptop the timezone was set correctly but the actual time was way off.

I tried to see if I can easily find a post to explain to me how it’s done but the standard GUI way didn’t work.

The actual solution is in the code below

[sorin-20fjs3dr01 ~]# timedatectl
               Local time: Sb 2021-08-28 13:07:40 EEST
           Universal time: Sb 2021-08-28 10:07:40 UTC
                 RTC time: Sb 2021-08-28 10:07:40
                Time zone: Europe/Bucharest (EEST, +0300)
System clock synchronized: no
              NTP service: inactive
          RTC in local TZ: no
[sorin-20fjs3dr01 ~]# systemctl status ntpd.service
○ ntpd.service - Network Time Service
     Loaded: loaded (/usr/lib/systemd/system/ntpd.service; disabled; vendor preset: disabled)
     Active: inactive (dead)
[sorin-20fjs3dr01 ~]#  systemctl status systemd-timesyncd.service
○ systemd-timesyncd.service - Network Time Synchronization
     Loaded: loaded (/usr/lib/systemd/system/systemd-timesyncd.service; disabled; vendor preset: enabled)
     Active: inactive (dead)
       Docs: man:systemd-timesyncd.service(8)
[sorin-20fjs3dr01 ~]# systemctl start systemd-timesyncd.service
[sorin-20fjs3dr01 ~]# ^C
[sorin-20fjs3dr01 ~]# systemctl status systemd-timesyncd.service
● systemd-timesyncd.service - Network Time Synchronization
     Loaded: loaded (/usr/lib/systemd/system/systemd-timesyncd.service; disabled; vendor preset: enabled)
     Active: active (running) since Sat 2021-08-28 13:09:09 EEST; 2h 59min left
       Docs: man:systemd-timesyncd.service(8)
   Main PID: 2080 (systemd-timesyn)
     Status: "Initial synchronization to time server 195.135.194.3:123 (0.manjaro.pool.ntp.org)."
      Tasks: 2 (limit: 19010)
     Memory: 1.3M
        CPU: 51ms
     CGroup: /system.slice/systemd-timesyncd.service
             └─2080 /usr/lib/systemd/systemd-timesyncd

aug 28 13:09:09 sorin-20fjs3dr01 systemd[1]: Starting Network Time Synchronization...
aug 28 13:09:09 sorin-20fjs3dr01 systemd[1]: Started Network Time Synchronization.
aug 28 10:09:10 sorin-20fjs3dr01 systemd-timesyncd[2080]: Initial synchronization to time server 195.135.194.3:123 (0.manjaro.pool.ntp.org).
[sorin-20fjs3dr01 ~]# systemctl enable systemd-timesyncd.service
Created symlink /etc/systemd/system/dbus-org.freedesktop.timesync1.service → /usr/lib/systemd/system/systemd-timesyncd.service.
Created symlink /etc/systemd/system/sysinit.target.wants/systemd-timesyncd.service → /usr/lib/systemd/system/systemd-timesyncd.service.
[sorin-20fjs3dr01 ~]# 

Turns out that both ntpd and timesyncd are dead and do not start by default, so the actual fix is by starting and enabling timesyncd.

Cheers,

Sorin

Categories
Uncategorized

Getting stocks basic data using yfinance

Hi,

If you are thinking of investing and also to create a perfect opportunity so that you can play with data in pandas, here is the use case I am working on.

Basically, from what I understood, if you want to value invest, there are two main parameters to take a look at before doing any other in depth research: P/B and P/E. Both of them show if the company has the potential to grow.

How can we retrieve these parameters using Python from Yahoo Finance for example … and the code that worked for me is as follows:

import yfinance as yf
import pandas as pd

payload=pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
first_table = payload[0]
second_table = payload[1]
df_sp = first_table

statscsv = open('stats.csv', 'a')

for value in df_sp['Symbol']:
    stock = yf.Ticker(value)
    if 'priceToBook' in stock.info:
        statscsv.write(value+","+str(stock.info['priceToBook'])+","+str(stock.info['priceToSalesTrailing12Months'])+"\n")

statscsv.close()

I’ve been trying a lot to put the info directly in a pandas DataFrame and it did not work so for the purpose of querying the API only once, it makes a lot of sense to store it in an CSV file saved locally.

After it is saved locally you can manually load it to the DataFrame object by using (for my usage i manually added the column names into the file like Symbol,PB,PE at the beginning)

df_pb = pd.read_csv("stats.csv")

From what I saw, in some cases P/B data is not available in the output so the value ‘None’.

You can manually change that by replacing it with 0 and store it in a different DataFrame, like this

df_pb_clean = df_pb.replace({"None":"0"})

After you done this, you need to also convert the types of columns from object to float64 so that you can query specific values

df_pb_clean['PB'] = df_pb_clean['PB'].astype(float)
df_pb_clean['PE'] = df_pb_clean['PE'].astype(float)

After all of this is done, you can query it just as easy as

df_pb_green = df_pb_clean.query('0.0 < PB < 2.0')

And after that filter maybe also P/E for you use case.

The main goal is that we filter only the company with growth so that we can try to retrieve historical data and see main methods of analysis.

Cheers

Categories
kafka

SASL config issue on latest Kafka versions

Hello,

Today I want to share with you a problem that we needed to fix when we decided to activate SASL.

Normally, the steps are pretty straight forward and you can use Confluent doku or the general Apache Kafka.

The main catch is that if you have a certain property in your config file, the following error will appear in a loop:

[2021-01-11 09:17:28,052] ERROR Processor [0..n] closed connection from null (kafka.network.Processor)
java.io.IOException: Channel could not be created for socket java.nio.channels.SocketChannel[closed]
	at org.apache.kafka.common.network.Selector.buildAndAttachKafkaChannel(Selector.java:348)
	at org.apache.kafka.common.network.Selector.registerChannel(Selector.java:329)
	at org.apache.kafka.common.network.Selector.register(Selector.java:311)
	at kafka.network.Processor.configureNewConnections(SocketServer.scala:1024)
	at kafka.network.Processor.run(SocketServer.scala:757)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.common.KafkaException: java.lang.NullPointerException
	at org.apache.kafka.common.network.SaslChannelBuilder.buildChannel(SaslChannelBuilder.java:228)
	at org.apache.kafka.common.network.Selector.buildAndAttachKafkaChannel(Selector.java:338)
	... 5 more
Caused by: java.lang.NullPointerException
	at java.base/java.util.Objects.requireNonNull(Objects.java:221)
	at org.apache.kafka.common.security.authenticator.DefaultKafkaPrincipalBuilder.fromOldPrincipalBuilder(DefaultKafkaPrincipalBuilder.java:77)
	at org.apache.kafka.common.network.ChannelBuilders.createPrincipalBuilder(ChannelBuilders.java:216)
	at org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.<init>(SaslServerAuthenticator.java:183)
	at org.apache.kafka.common.network.SaslChannelBuilder.buildServerAuthenticator(SaslChannelBuilder.java:262)
	at org.apache.kafka.common.network.SaslChannelBuilder.lambda$buildChannel$0(SaslChannelBuilder.java:207)
	at org.apache.kafka.common.network.KafkaChannel.<init>(KafkaChannel.java:143)
	at org.apache.kafka.common.network.SaslChannelBuilder.buildChannel(SaslChannelBuilder.java:224)
	... 6 more

The cause for this is property:

principal.builder.class=org.apache.kafka.common.security.auth.DefaultPrincipalBuilder

Normally, for the latest versions of Apache Kafka like 2.x.x, it should not be set at all so that when the process starts it will be like:

principal.builder.class=null
Categories
machine learning

Plot a math function in Python

Hi,

I just started a recap of calculus and wanted to know how and if it’s hard to plot functions in a programming language.

Searching this topic I found this article, which gives an elegant approach:

https://scriptverse.academy/tutorials/python-matplotlib-plot-function.html

After trying the code here is the result

Surely there are even more complex cases but at least there is a start for adapting the code.

Cheers