fix: drop read_only=True to avoid openpyxl dimension truncation

openpyxl in read_only mode stops iterating at the sheet's cached <dimension ref>
attribute in the XML. If MTZ extended the Excel beyond the original row range,
those rows were silently ignored (hence always ~4000 products regardless of the
real count). Removing read_only=True forces openpyxl to read all actual data rows.
The file is already in BytesIO so there is no I/O penalty.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Malin
2026-03-22 20:51:08 +01:00
parent 8d75fcd060
commit 3b167cd396

View File

@@ -55,7 +55,10 @@ def download_and_parse():
global products_cache, last_refresh
resp = requests.get(EXCEL_URL, timeout=60)
resp.raise_for_status()
wb = load_workbook(BytesIO(resp.content), read_only=True, data_only=True)
# read_only=True would stop at the sheet's declared dimension attribute, silently
# missing any rows MTZ added beyond the original range. Since the file is already
# in memory (BytesIO), read_only gives no I/O benefit and data_only=True suffices.
wb = load_workbook(BytesIO(resp.content), data_only=True)
ws = wb.active
rows = list(ws.iter_rows(min_row=6, values_only=True))
parsed = []