Since mostly I play with financial data and data science concepts, I think it’s a good idea to create a series called Data science in investing in which I share the small things I learn.
Python 3.10 or 3.11 (Strictly required; 3.12+ is incompatible).
Google AI Studio API Key (Free tier works).
Conda or a virtual environment manager.
1. Architecture Overview
Because OpenBB Agents are hardcoded for OpenAI’s SDK, we cannot connect Gemini directly. We use a “Bridge” architecture:
OpenBB Workspace → Local Agent Server (Port 8095) → LiteLLM Bridge (Port 4000) → Google Gemini API
2. Environment Setup (The “Bridge”)
First, we set up LiteLLM, a tool that translates OpenAI HTTP requests into Google Gemini requests.
Install LiteLLM: Bash pip install litellm
Run the Bridge: Open a dedicated terminal window (keep this running) and start the proxy. We use the Flash model because it supports the large context windows required for analyzing financial datasets. Bash # Replace with your actual model name if different export GEMINI_API_KEY=your_google_key_here litellm –model gemini/gemini-latest-flash –port 4000
Result: You now have an “OpenAI-compatible” server running at http://127.0.0.1:4000.
3. Agent Installation (The “Brain”)
The official openbb-agents package on PyPI is often outdated. We must install the agent server from the source to avoid dependency conflicts.
Install the Core Library (Editable Mode): Clone the library repository and install it so Python can find the base modules. Bash git clone https://github.com/OpenBB-finance/openbb-agents.git cd openbb-agents pip install -e . (there are some dependencies to be fixed, it will want pydantic = “==2.7.1” and you need to update it under [home_dir]/OpenBB/openbb-agents/pyproject.toml to pydantic = “>=2.7.1”)
Get the Runnable Examples: Clone the secondary repository that contains the actual server implementations. Bash cd .. # Go back to your root folder git clone https://github.com/OpenBB-finance/agents-for-openbb.git cd agents-for-openbb
4. Configuration & Launch
We will run the “Vanilla Agent” (Agent #30), which can read widgets from your dashboard. We must configure it to talk to our local Bridge (Port 4000) instead of OpenAI servers.
Navigate to the Agent: Bash cd 30-vanilla-agent-raw-widget-data
This guide outlines the professional process of moving two WordPress installations to RackNerd. By using a “Backdoor Subdomain,” we ensure the old site remains accessible for the migration “pull” even after the primary domain is pointed to the new server.
Phase 1: Preparation
Backup Everything: Perform a full backup of your files and database on the old host.
Gather Credentials:
Old Host: cPanel login or FTP/IP details.
New Host (RackNerd): Shared IP address, cPanel username, and password (found in your “Welcome Email”).
Network Solutions Access: Ensure you have login access to your domain registrar to manage DNS/Nameservers. Setup the RackNerd nameservers to be used for the domain
Phase 2: Setting Up RackNerd (Destination)
Add the Domains:
Login to RackNerd cPanel.
Go to Domains > Addon Domains.
Add your domain.
Install Fresh WordPress (Optional but helpful): Use the Softaculous installer in RackNerd to install a blank WordPress site on both domains. This ensures the database and folders are ready for Migrate Guru to overwrite.
Phase 3: The “Backdoor Subdomain” (Source Setup)
This step allows you to access the old site dashboard while the main domain is live on RackNerd.
Create the DNS Record:
In your active DNS manager (Network Solutions or RackNerd, wherever the nameservers point), add an A Record.
Host:old
Points to: The IP Address of your OLD host (you can take it from the cPanel interface).
Configure Old cPanel:
On the old host, create the subdomain old.yourdomain.com.
Crucial: Set the Document Root to point to the existing site folder (usually /public_html/).
Override WordPress URL:
On the old host, open wp-config.php via File Manager.
Add these lines at the very top (after <?php):PHPdefine( 'WP_HOME', 'http://old.yourdomain.com' ); define( 'WP_SITEURL', 'http://old.yourdomain.com' );
Troubleshooting: If it still redirects, rename your .htaccess file to .htaccess_bak on the old host.
Phase 4: Migration with Migrate Guru
Access Old Site: Log in to http://old.yourdomain.com/wp-admin.
Install Migrate Guru: Search for and activate the “Migrate Guru” plugin.
Configure Migration:
Click that it is installed on both env copy key from the new site on the old one.
Initiate: Click “Migrate.” The plugin will clone the site to RackNerd in the background.
Phase 5: Testing & Going Live
Verify that everything functions correctly
SSL Installation: In RackNerd cPanel, go to SSL/TLS Status and run “AutoSSL” to ensure your site has a valid HTTPS certificate.
Phase 6: Post-Migration Cleanup
New Server Config: On the RackNerd server, open wp-config.php and ensure those “old” subdomain lines are not present. If they were copied over, delete them.
Permalinks: In the new WordPress dashboard, go to Settings > Permalinks and click “Save Changes” twice to refresh the URL structure.
Decommission: Once you are 100% sure the migration is successful, you can cancel your old hosting account.
Here is another script that can be used to calculate the total bucket usage per project from the cloud monitoring metrics.
import time
from google.cloud import monitoring_v3
from google.protobuf.duration_pb2 import Duration
def format_bytes(bytes_value):
"""Converts bytes to a human-readable format."""
if bytes_value is None:
return "N/A"
bytes_value = float(bytes_value)
units = ['Bytes', 'KiB', 'MiB', 'GiB', 'TiB', 'PiB']
i = 0
size = bytes_value
while size >= 1024 and i < len(units) - 1:
size /= 1024
i += 1
return f"{size:.2f} {units[i]}"
def get_total_gcs_storage_for_project(project_id):
"""
Fetches the total_bytes metric, summed across ALL GCS buckets in a project.
Args:
project_id (str): Your Google Cloud project ID.
Returns:
float: The total size of all buckets in bytes, or None if no data is found.
"""
client = monitoring_v3.MetricServiceClient()
# FIX 1: Use manual path construction for older library versions
project_name = f"projects/{project_id}"
# 1. Define the Time Interval (last 2 days)
now = int(time.time()) # Get current time as integer seconds
# FIX 2: Instantiate TimeInterval and TimeStamp components directly
interval = monitoring_v3.TimeInterval(
end_time={'seconds': now}, # TimeStamp using dict for seconds
start_time={'seconds': now - 2 * 86400}
)
# 2. Define the Metric Filter
metric_filter = (
'metric.type = "storage.googleapis.com/storage/total_bytes" '
'AND resource.type = "gcs_bucket"'
)
# 3. Define Aggregation for Summing All Buckets
aggregation = monitoring_v3.Aggregation(
# FIX 3: Instantiate Duration object manually
alignment_period=Duration(seconds=86400), # 1 day
per_series_aligner=monitoring_v3.Aggregation.Aligner.ALIGN_MAX,
cross_series_reducer=monitoring_v3.Aggregation.Reducer.REDUCE_SUM,
group_by_fields=[],
)
# 4. List Time Series
try:
results = client.list_time_series(
request={
"name": project_name,
"filter": metric_filter,
"interval": interval,
"aggregation": aggregation,
"view": monitoring_v3.ListTimeSeriesRequest.TimeSeriesView.FULL,
}
)
except Exception as e:
print(f"Error querying Cloud Monitoring: {e}")
return None
# Process the result (should contain only one time series with one point)
for result in results:
if result.points:
total_bytes = result.points[0].value.double_value
return total_bytes
return None
# --- Configuration ---
GCP_PROJECT_ID = "PROJECT_ID" # <-- CHANGE THIS
# ---------------------
if __name__ == "__main__":
total_size_bytes = get_total_gcs_storage_for_project(GCP_PROJECT_ID)
print(f"Project ID: {GCP_PROJECT_ID}")
print("---")
if total_size_bytes is not None and total_size_bytes >= 0:
print(f"Total GCS Storage Used (Raw Bytes): {int(total_size_bytes)}")
print(f"Total GCS Storage Used (Human-readable): {format_bytes(total_size_bytes)}")
else:
print("Could not retrieve total storage metric. Check the project ID, permissions, or if Cloud Monitoring is enabled.")
I saw this use case in one of the challenge labs from Google skillboost, and it kind of intrigued me, so I decided to reproduce it.
It’s quite easy to export historic data and ratings from Financial Modeling Prep for a certain stock. In my case that was Merck, a drug producer.
So, I quickly drafted the following script:
import requests
import os
import json
import time
FMP_KEY = os.getenv("FMP_KEY")
symbol = input("Enter stock symbol: ").upper()
start_date = input("Enter start date (YYYY-MM-DD): ")
end_date = input("Enter end date (YYYY-MM-DD): ")
url = f"https://financialmodelingprep.com/stable/historical-price-eod/full?symbol={symbol}&from={start_date}&to={end_date}&apikey={FMP_KEY}"
def fetch_json(url):
try:
response = requests.get(url)
response.raise_for_status() # Raise an error for bad status codes
return response.json()
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
return None
data = fetch_json(url)
with open(f"{symbol}_historical_data.json", "w") as f:
for item in data:
f.write(json.dumps(item) + "\n")
date_format = "%Y-%m-%d"
t1 = time.mktime(time.strptime(start_date, date_format))
t2 = time.mktime(time.strptime(end_date, date_format))
days = int((t2 - t1) / 86400)
url_rating=f"https://financialmodelingprep.com/stable/ratings-historical?symbol={symbol}&limit={days}&apikey={FMP_KEY}"
data_rating = fetch_json(url_rating)
with open(f"{symbol}_historical_ratings.json", "w") as f:
for item in data_rating:
f.write(json.dumps(item) + "\n")
This script creates two New Line Delimiter JSON (which actually means that each JSON entry is a completely independent JSON on a line, and this is the BigQuery format supported for Schema Recognition)
After the files are exported, I manually created two tables from these files called mkr_ratings and mrk_historical
Good, now we have to denormalize some of the data and add our RECORD struct between the tables.
Not wanting to waste time, I added the schema from GUI like
And now we need to map data from historical to ratings in the following relation.
mrk_historical.close AS price.close
mrk_historical.spread AS price.spread
mrk_historical.change AS price.change
We also need a anchor field and that is in our case date, which is present in both tables.
After trying to draft it by myself and review it and improve it using a little bit of LLM help, I reached the following form, which is annoyingly simple.
UPDATE `[GCP_PROJECT]`.smallcaps.mrk_ratings AS mrk_ratings
SET
price = STRUCT(mrk_historical.close AS close, mrk_historical.spread AS spread, mrk_historical.change AS change)
FROM
`[GCP_PROJECT]`.smallcaps.mrk_historical AS mrk_historical
WHERE
mrk_ratings.date = mrk_historical.date;
So, you just update the first table and give it an alias, set the actual RECORD as a STRUCT of the required fields from the second table, and at the end match the anchor column of both aliases. Sweet!
I am currently working on a bigger article for Medium but until I can put it words so that it’s really something really worth sharing, just wanted to add on one of the steps that I learned while working.
Normally I wanted to load some info that was stored in CSV to BigQuery for analysis and filtering.
I though that you can just add the header to CSV file and it will automatically recognize it and load it.
Turns out that it’s a little bit more complicated.
Normally it should work, and since it did with my first CSV, couldn’t really understand what was wrong.
Now, there are two parts to this story:
How can you add the Table schema manually, and it will reveal the actual issue.
What do you need to be aware of and why this happens
How can you add the table schema manually
The data that is written to CSV is actually a Dataframe, so you have info about the types of the columns directly from code
dtype_mapping = {
'object': 'STRING',
'int64': 'FLOAT',
'float64': 'FLOAT'
}
schema = []
for column, dtype in df.dtypes.items():
schema.append({
'name': column,
'type': dtype_mapping.get(str(dtype), 'STRING') # Default to STRING if type is not in map
})
import json
print(json.dumps(schema, indent=2))
Yes, I know, int64 should be mapped to INTEGER, but it turns out that for my case some columns even if in Python are market as int64, in BigQuery they need to be FLOAT. I know there is more memory allocated but the dataset is quite small.
So you can easily use so that you can exclude the header.
df.to_csv(f"df.csv", index=False, mode='a')
The above piece of code will help you create a SCHEMA from a Dataframe header
What do you need to be aware of and why this happens
The actually reason why this happened is because I was not aware that somewhere in my csv file a line with the header definition still remained (yes, I actually wrote multiple dataframes with header and filtered with a Linux command, and it did not work)
And when the file loaded, I actually saw this:
Normally if that line was missing and no header, that it should had looked like
Things to be learned from this exercise:
If you have to write multiple dataframes in a CSV file, don’t add the header and use the above code to generate a specific definition of the schema
Properly check the CSV not to have rogue lines that don’t match the rest of the structure of the data, otherwise you will find out that everything is converted to string and you don’t understand why.
I will keep it short as a message since sharing screens or details is a security breach 😄:
1 Google decides to migrate 2 of you VMs in a space of 5 min (lucky us)
2 The cluster becomes unbalanced, surely you also have Kafka Manager installed locally that doesn’t make your life any easier (Tnx Yahoo)
3 You receive a message from monitoring a couple of hours later that you have offline partitions (but why?)
4 You find out that one of you node is not responding anymore so you restart it
5 After replication you see that your other node has increased IOWait and load so you take action to restart it as well
6 News flash, your partitions are not replicating anymore, but why? So you take a look on the topics to see that ISR is only on the node you restarted the first time and that is also the leader of all partitions…..wow, strange….but it isn’t, because you changed a very important parameter…..unclean.leader.election.enabled=false , and now, yes, comes the AHA moment
7 What the hell do you do because that parameters constrains Kafka from moving the leader to a node that it’s not in ISR
8 You decide that the actual issue is with the first node you restarted which is also the controller, so you restart it to force the cluster to restart replication even if you will have downtime and offline partitions
9 The cluster restarts the replication process and the other nodes are brought in ISR
10 You manually rebalance the cluster partitions so that the first node does not get overwhelmed