Update RECORD struct in BigQuery with data from other table

I saw this use case in one of the challenge labs from Google skillboost, and it kind of intrigued me, so I decided to reproduce it.

It’s quite easy to export historic data and ratings from Financial Modeling Prep for a certain stock. In my case that was Merck, a drug producer.

So, I quickly drafted the following script:

import requests
import os
import json
import time

FMP_KEY = os.getenv("FMP_KEY")
symbol = input("Enter stock symbol: ").upper()
start_date = input("Enter start date (YYYY-MM-DD): ")
end_date = input("Enter end date (YYYY-MM-DD): ")
url = f"https://financialmodelingprep.com/stable/historical-price-eod/full?symbol={symbol}&from={start_date}&to={end_date}&apikey={FMP_KEY}"
def fetch_json(url):
    try:
        response = requests.get(url)
        response.raise_for_status()  # Raise an error for bad status codes
        return response.json()
    except requests.exceptions.RequestException as e:
        print(f"An error occurred: {e}")
        return None
    
data = fetch_json(url)
with open(f"{symbol}_historical_data.json", "w") as f:
   for item in data:
        f.write(json.dumps(item) + "\n")

date_format = "%Y-%m-%d"
t1 = time.mktime(time.strptime(start_date, date_format))
t2 = time.mktime(time.strptime(end_date, date_format))
days = int((t2 - t1) / 86400)


url_rating=f"https://financialmodelingprep.com/stable/ratings-historical?symbol={symbol}&limit={days}&apikey={FMP_KEY}"
data_rating = fetch_json(url_rating)
with open(f"{symbol}_historical_ratings.json", "w") as f:
   for item in data_rating:
        f.write(json.dumps(item) + "\n")

This script creates two New Line Delimiter JSON (which actually means that each JSON entry is a completely independent JSON on a line, and this is the BigQuery format supported for Schema Recognition)

After the files are exported, I manually created two tables from these files called mkr_ratings and mrk_historical

Good, now we have to denormalize some of the data and add our RECORD struct between the tables.

Not wanting to waste time, I added the schema from GUI like

[
    {
        "name": "price",
        "type": "RECORD",
        "mode": "NULLABLE",
        "fields": [
            {
                "name": "close",
                "type": "FLOAT",
                "mode": "NULLABLE"
            },
            {
                "name": "spread",
                "type": "FLOAT",
                "mode": "NULLABLE"
            },
            {
                "name": "change",
                "type": "FLOAT",
                "mode": "NULLABLE"
            }
        ]
    }
]

In the end the schemas looked like this:

mkr_historical

mkr_ratings

And now we need to map data from historical to ratings in the following relation.

mrk_historical.close AS price.close

mrk_historical.spread AS price.spread

mrk_historical.change AS price.change

We also need a anchor field and that is in our case date, which is present in both tables.

After trying to draft it by myself and review it and improve it using a little bit of LLM help, I reached the following form, which is annoyingly simple.

UPDATE `[GCP_PROJECT]`.smallcaps.mrk_ratings AS mrk_ratings
SET
  price = STRUCT(mrk_historical.close AS close, mrk_historical.spread AS spread, mrk_historical.change AS change)
FROM
  `[GCP_PROJECT]`.smallcaps.mrk_historical AS mrk_historical
WHERE
  mrk_ratings.date = mrk_historical.date;

So, you just update the first table and give it an alias, set the actual RECORD as a STRUCT of the required fields from the second table, and at the end match the anchor column of both aliases. Sweet!

Update RECORD struct in BigQuery with data from other table

More posts

Update RECORD struct in BigQuery with data from other table

Adding schema to BigQuery table

Data analysis in investing

How to bring down a Kafka cluster without your fault.