fix: drop read_only=True to avoid openpyxl dimension truncation
openpyxl in read_only mode stops iterating at the sheet's cached <dimension ref> attribute in the XML. If MTZ extended the Excel beyond the original row range, those rows were silently ignored (hence always ~4000 products regardless of the real count). Removing read_only=True forces openpyxl to read all actual data rows. The file is already in BytesIO so there is no I/O penalty. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
5
main.py
5
main.py
@@ -55,7 +55,10 @@ def download_and_parse():
|
||||
global products_cache, last_refresh
|
||||
resp = requests.get(EXCEL_URL, timeout=60)
|
||||
resp.raise_for_status()
|
||||
wb = load_workbook(BytesIO(resp.content), read_only=True, data_only=True)
|
||||
# read_only=True would stop at the sheet's declared dimension attribute, silently
|
||||
# missing any rows MTZ added beyond the original range. Since the file is already
|
||||
# in memory (BytesIO), read_only gives no I/O benefit and data_only=True suffices.
|
||||
wb = load_workbook(BytesIO(resp.content), data_only=True)
|
||||
ws = wb.active
|
||||
rows = list(ws.iter_rows(min_row=6, values_only=True))
|
||||
parsed = []
|
||||
|
||||
Reference in New Issue
Block a user