12 Commits

Author SHA1 Message Date
Yuvi9587
d9364f4f91 commit 2025-08-14 09:48:55 -07:00
Yuvi9587
9cd48bb63a Update main_window.py 2025-08-13 19:49:10 -07:00
Yuvi9587
d0f11c4a06 Commit 2025-08-13 19:38:33 -07:00
Yuvi9587
26fa3b9bc1 Commit 2025-08-10 09:16:31 -07:00
Yuvi9587
f7c4d892a8 commit 2025-08-07 21:42:04 -07:00
Yuvi9587
661b97aa16 Commit 2025-08-06 06:56:49 -07:00
Yuvi9587
3704fece2b Update main_window.py 2025-08-04 04:53:52 -07:00
Yuvi9587
bdb7ac93c4 Update readme.md 2025-08-03 09:16:25 -07:00
Yuvi9587
76d4a3ea8a Update main_window.py 2025-08-03 09:15:01 -07:00
Yuvi9587
ccc7804505 Update readme.md 2025-08-03 09:13:47 -07:00
Yuvi9587
4ee750c5d4 Update drive_downloader.py 2025-08-03 09:11:27 -07:00
Yuvi9587
e9be13c4e3 Update readme.md 2025-08-03 09:07:29 -07:00
11 changed files with 1152 additions and 515 deletions

145
readme.md
View File

@@ -1,4 +1,4 @@
<h1 align="center">Kemono Downloader v6.0.0</h1> <h1 align="center">Kemono Downloader </h1>
<div align="center"> <div align="center">
@@ -41,108 +41,53 @@ Built with PyQt5, this tool is designed for users who want deep filtering capabi
</div> </div>
<h2><strong>Core Capabilities Overview</strong></h2>
--- <h3><strong>High-Performance Downloading</strong></h3>
<ul>
<li><strong>Multi-threading:</strong> Processes multiple posts simultaneously to greatly accelerate downloads from large creator profiles.</li>
<li><strong>Multi-part Downloading:</strong> Splits large files into chunks and downloads them in parallel to maximize speed.</li>
<li><strong>Resilience:</strong> Supports pausing, resuming, and restoring downloads after crashes or interruptions.</li>
</ul>
## Feature Overview <h3><strong>Advanced Filtering & Content Control</strong></h3>
<ul>
<li><strong>Content Type Filtering:</strong> Select whether to download all files or limit to images, videos, audio, or archives only.</li>
<li><strong>Keyword Skipping:</strong> Automatically skips posts or files containing certain keywords (e.g., "WIP", "sketch").</li>
<li><strong>Character Filtering:</strong> Restricts downloads to posts that match specific character or series names.</li>
</ul>
Kemono Downloader offers a range of features to streamline your content downloading experience: <h3><strong>File Organization & Renaming</strong></h3>
<ul>
<li><strong>Automated Subfolders:</strong> Automatically organizes downloaded files into subdirectories based on character names or per post.</li>
<li><strong>Advanced File Renaming:</strong> Flexible renaming options, especially in Manga Mode, including:
<ul>
<li><strong>Post Title:</strong> Uses the post's title (e.g., <code>Chapter-One.jpg</code>).</li>
<li><strong>Date + Original Name:</strong> Prepends the publication date to the original filename.</li>
<li><strong>Date + Title:</strong> Combines the date with the post title.</li>
<li><strong>Sequential Numbering (Date Based):</strong> Simple sequence numbers (e.g., <code>001.jpg</code>, <code>002.jpg</code>).</li>
<li><strong>Title + Global Numbering:</strong> Uses post title with a globally incrementing number across the session.</li>
<li><strong>Post ID:</strong> Names files using the posts unique ID.</li>
</ul>
</li>
</ul>
- **User-Friendly Interface:** A modern PyQt5 GUI for easy navigation and operation. <h3><strong>Specialized Modes</strong></h3>
<ul>
<li><strong>Manga/Comic Mode:</strong> Sorts posts chronologically before downloading to ensure pages appear in the correct sequence.</li>
<li><strong>Favorite Mode:</strong> Connects to your account and downloads from your favorites list (artists or posts).</li>
<li><strong>Link Extraction Mode:</strong> Extracts external links from posts for export or targeted downloading.</li>
<li><strong>Text Extraction Mode:</strong> Saves post descriptions or comment sections as <code>PDF</code>, <code>DOCX</code>, or <code>TXT</code> files.</li>
</ul>
- **Flexible Downloading:** <h3><strong>Utility & Advanced Features</strong></h3>
- Download content from Kemono.su (and mirrors) and Coomer.party (and mirrors). <ul>
- Supports creator pages (with page range selection) and individual post URLs. <li><strong>Cookie Support:</strong> Enables access to subscriber-only content via browser session cookies.</li>
- Standard download controls: Start, Pause, Resume, and Cancel. <li><strong>Duplicate Detection:</strong> Prevents saving duplicate files using content-based comparison, with configurable limits.</li>
<li><strong>Image Compression:</strong> Automatically converts large images to <code>.webp</code> to reduce disk usage.</li>
- **Powerful Filtering:** <li><strong>Creator Management:</strong> Built-in creator browser and update checker for downloading only new posts from saved profiles.</li>
- **Character Filtering:** Filter content by character names. Supports simple comma-separated names and grouped names for shared folders. <li><strong>Error Handling:</strong> Tracks failed downloads and provides a retry dialog with options to export or redownload missing files.</li>
- **Keyword Skipping:** Skip posts or files based on specified keywords. </ul>
- **Filename Cleaning:** Remove unwanted words or phrases from downloaded filenames.
- **File Type Selection:** Choose to download all files, or limit to images/GIFs, videos, audio, or archives. Can also extract external links only.
- **Customizable Downloads:**
- **Thumbnails Only:** Option to download only small preview images.
- **Content Scanning:** Scan post HTML for `<img>` tags and direct image links, useful for images embedded in descriptions.
- **WebP Conversion:** Convert images to WebP format for smaller file sizes (requires Pillow library).
- **Organized Output:**
- **Automatic Subfolders:** Create subfolders based on character names (from filters or `Known.txt`) or post titles.
- **Per-Post Subfolders:** Option to create an additional subfolder for each individual post.
- **Manga/Comic Mode:**
- Downloads posts from a creator's feed in chronological order (oldest to newest).
- Offers various filename styling options for sequential reading (e.g., post title, original name, global numbering).
- **⭐ Favorite Mode:**
- Directly download from your favorited artists and posts on Kemono.su.
- Requires a valid cookie and adapts the UI for easy selection from your favorites.
- Supports downloading into a single location or artist-specific subfolders.
- **Performance & Advanced Options:**
- **Cookie Support:** Use cookies (paste string or load from `cookies.txt`) to access restricted content.
- **Multithreading:** Configure the number of simultaneous downloads/post processing threads for improved speed.
- **Logging:**
- A detailed progress log displays download activity, errors, and summaries.
- **Multi-language Interface:** Choose from several languages for the UI (English, Japanese, French, Spanish, German, Russian, Korean, Chinese Simplified).
- **Theme Customization:** Selectable Light and Dark themes for user comfort.
---
## ✨ What's New in v6.0.0
This release focuses on providing more granular control over file organization and improving at-a-glance status monitoring.
### New Features
- **Live Error Count on Button**
The **"Error" button** now dynamically displays the number of failed files during a download. Instead of opening the dialog, you can quickly see a live count like `(3) Error`, helping you track issues at a glance.
- **Date Prefix for Post Subfolders**
A new checkbox labeled **"Date Prefix"** is now available in the advanced settings.
When enabled alongside **"Subfolder per Post"**, it prepends the post's upload date to the folder name (e.g., `2025-07-11 Post Title`).
This makes your downloads sortable and easier to browse chronologically.
- **Keep Duplicates Within a Post**
A **"Keep Duplicates"** option has been added to preserve all files from a post — even if some have the same name.
Instead of skipping or overwriting, the downloader will save duplicates with numbered suffixes (e.g., `image.jpg`, `image_1.jpg`, etc.), which is especially useful when the same file name points to different media.
### Bug Fixes
- The downloader now correctly renames large `.part` files when completed, avoiding leftover temp files.
- The list of failed files shown in the Error Dialog is now saved and restored with your session — so no errors get lost if you close the app.
- Your selected download location is remembered, even after pressing the **Reset** button.
- The **Cancel** button is now enabled when restoring a pending session, so you can abort stuck jobs more easily.
- Internal cleanup logs (like "Deleting post cache") are now excluded from the final download summary for clarity.
---
## 📅 Next Update Plans
### 🔖 Post Tag Filtering (Planned for v6.1.0)
A powerful new **"Filter by Post Tags"** feature is planned:
- Filter and download content based on specific post tags.
- Combine tag filtering with current filters (character, file type, etc.).
- Use tag presets to automate frequent downloads.
This will provide **much greater control** over what gets downloaded, especially for creators who use tags consistently.
### 📁 Creator Download History (.json Save)
To streamline incremental downloads, a new system will allow the app to:
- Save a `.json` file with metadata about already-downloaded posts.
- Compare that file on future runs, so only **new** posts are downloaded.
- Avoids duplication and makes regular syncs fast and efficient.
Ideal for users managing large collections or syncing favorites regularly.
---
## 💻 Installation ## 💻 Installation
@@ -154,7 +99,7 @@ Ideal for users managing large collections or syncing favorites regularly.
### Install Dependencies ### Install Dependencies
```bash ```bash
pip install PyQt5 requests Pillow mega.py pip install PyQt5 requests Pillow mega.py fpdf2 python-docx
``` ```
### Running the Application ### Running the Application
@@ -197,7 +142,7 @@ Feel free to fork this repo and submit pull requests for bug fixes, new features
## License ## License
This project is under the Custom Licence This project is under the MIT Licence
## Star History ## Star History

View File

@@ -60,6 +60,7 @@ DOWNLOAD_LOCATION_KEY = "downloadLocationV1"
RESOLUTION_KEY = "window_resolution" RESOLUTION_KEY = "window_resolution"
UI_SCALE_KEY = "ui_scale_factor" UI_SCALE_KEY = "ui_scale_factor"
SAVE_CREATOR_JSON_KEY = "saveCreatorJsonProfile" SAVE_CREATOR_JSON_KEY = "saveCreatorJsonProfile"
FETCH_FIRST_KEY = "fetchAllPostsFirst"
# --- UI Constants and Identifiers --- # --- UI Constants and Identifiers ---
HTML_PREFIX = "<!HTML!>" HTML_PREFIX = "<!HTML!>"
@@ -97,7 +98,7 @@ FOLDER_NAME_STOP_WORDS = {
"for", "he", "her", "his", "i", "im", "in", "is", "it", "its", "for", "he", "her", "his", "i", "im", "in", "is", "it", "its",
"me", "my", "net", "not", "of", "on", "or", "org", "our", "me", "my", "net", "not", "of", "on", "or", "org", "our",
"s", "she", "so", "the", "their", "they", "this", "s", "she", "so", "the", "their", "they", "this",
"to", "ve", "was", "we", "were", "with", "www", "you", "your", "to", "ve", "was", "we", "were", "with", "www", "you", "your", "nsfw", "sfw",
# add more according to need # add more according to need
} }
@@ -111,7 +112,9 @@ CREATOR_DOWNLOAD_DEFAULT_FOLDER_IGNORE_WORDS = {
"may", "jun", "june", "jul", "july", "aug", "august", "sep", "september", "may", "jun", "june", "jul", "july", "aug", "august", "sep", "september",
"oct", "october", "nov", "november", "dec", "december", "oct", "october", "nov", "november", "dec", "december",
"mon", "monday", "tue", "tuesday", "wed", "wednesday", "thu", "thursday", "mon", "monday", "tue", "tuesday", "wed", "wednesday", "thu", "thursday",
"fri", "friday", "sat", "saturday", "sun", "sunday" "fri", "friday", "sat", "saturday", "sun", "sunday", "Pack", "tier", "spoiler",
# add more according to need # add more according to need
} }

View File

@@ -1,7 +1,7 @@
import time import time
import traceback import traceback
from urllib.parse import urlparse from urllib.parse import urlparse
import json # Ensure json is imported import json
import requests import requests
from ..utils.network_utils import extract_post_info, prepare_cookies_for_request from ..utils.network_utils import extract_post_info, prepare_cookies_for_request
from ..config.constants import ( from ..config.constants import (
@@ -41,9 +41,14 @@ def fetch_posts_paginated(api_url_base, headers, offset, logger, cancellation_ev
try: try:
response = requests.get(paginated_url, headers=headers, timeout=(15, 60), cookies=cookies_dict) response = requests.get(paginated_url, headers=headers, timeout=(15, 60), cookies=cookies_dict)
response.raise_for_status() response.raise_for_status()
response.encoding = 'utf-8'
return response.json() return response.json()
except requests.exceptions.RequestException as e: except requests.exceptions.RequestException as e:
if e.response is not None and e.response.status_code == 400:
logger(f" ✅ Reached end of posts (API returned 400 Bad Request for offset {offset}).")
return []
logger(f" ⚠️ Retryable network error on page fetch (Attempt {attempt + 1}): {e}") logger(f" ⚠️ Retryable network error on page fetch (Attempt {attempt + 1}): {e}")
if attempt < max_retries - 1: if attempt < max_retries - 1:
delay = retry_delay * (2 ** attempt) delay = retry_delay * (2 ** attempt)
@@ -81,9 +86,12 @@ def fetch_single_post_data(api_domain, service, user_id, post_id, headers, logge
response_body += chunk response_body += chunk
full_post_data = json.loads(response_body) full_post_data = json.loads(response_body)
if isinstance(full_post_data, list) and full_post_data: if isinstance(full_post_data, list) and full_post_data:
return full_post_data[0] return full_post_data[0]
return full_post_data if isinstance(full_post_data, dict) and 'post' in full_post_data:
return full_post_data['post']
return full_post_data
except Exception as e: except Exception as e:
logger(f" ❌ Failed to fetch full content for post {post_id}: {e}") logger(f" ❌ Failed to fetch full content for post {post_id}: {e}")
@@ -101,6 +109,7 @@ def fetch_post_comments(api_domain, service, user_id, post_id, headers, logger,
try: try:
response = requests.get(comments_api_url, headers=headers, timeout=(10, 30), cookies=cookies_dict) response = requests.get(comments_api_url, headers=headers, timeout=(10, 30), cookies=cookies_dict)
response.raise_for_status() response.raise_for_status()
response.encoding = 'utf-8'
return response.json() return response.json()
except requests.exceptions.RequestException as e: except requests.exceptions.RequestException as e:
raise RuntimeError(f"Error fetching comments for post {post_id}: {e}") raise RuntimeError(f"Error fetching comments for post {post_id}: {e}")
@@ -120,7 +129,8 @@ def download_from_api(
selected_cookie_file=None, selected_cookie_file=None,
app_base_dir=None, app_base_dir=None,
manga_filename_style_for_sort_check=None, manga_filename_style_for_sort_check=None,
processed_post_ids=None processed_post_ids=None,
fetch_all_first=False
): ):
headers = { headers = {
'User-Agent': 'Mozilla/5.0', 'User-Agent': 'Mozilla/5.0',
@@ -140,12 +150,9 @@ def download_from_api(
parsed_input_url_for_domain = urlparse(api_url_input) parsed_input_url_for_domain = urlparse(api_url_input)
api_domain = parsed_input_url_for_domain.netloc api_domain = parsed_input_url_for_domain.netloc
# --- START: MODIFIED LOGIC ---
# This list is updated to include the new .cr and .st mirrors for validation.
if not any(d in api_domain.lower() for d in ['kemono.su', 'kemono.party', 'kemono.cr', 'coomer.su', 'coomer.party', 'coomer.st']): if not any(d in api_domain.lower() for d in ['kemono.su', 'kemono.party', 'kemono.cr', 'coomer.su', 'coomer.party', 'coomer.st']):
logger(f"⚠️ Unrecognized domain '{api_domain}' from input URL. Defaulting to kemono.su for API calls.") logger(f"⚠️ Unrecognized domain '{api_domain}' from input URL. Defaulting to kemono.su for API calls.")
api_domain = "kemono.su" api_domain = "kemono.su"
# --- END: MODIFIED LOGIC ---
cookies_for_api = None cookies_for_api = None
if use_cookie and app_base_dir: if use_cookie and app_base_dir:
@@ -159,6 +166,7 @@ def download_from_api(
try: try:
direct_response = requests.get(direct_post_api_url, headers=headers, timeout=(10, 30), cookies=cookies_for_api) direct_response = requests.get(direct_post_api_url, headers=headers, timeout=(10, 30), cookies=cookies_for_api)
direct_response.raise_for_status() direct_response.raise_for_status()
direct_response.encoding = 'utf-8'
direct_post_data = direct_response.json() direct_post_data = direct_response.json()
if isinstance(direct_post_data, list) and direct_post_data: if isinstance(direct_post_data, list) and direct_post_data:
direct_post_data = direct_post_data[0] direct_post_data = direct_post_data[0]
@@ -183,7 +191,8 @@ def download_from_api(
logger("⚠️ Page range (start/end page) is ignored when a specific post URL is provided (searching all pages for the post).") logger("⚠️ Page range (start/end page) is ignored when a specific post URL is provided (searching all pages for the post).")
is_manga_mode_fetch_all_and_sort_oldest_first = manga_mode and (manga_filename_style_for_sort_check != STYLE_DATE_POST_TITLE) and not target_post_id is_manga_mode_fetch_all_and_sort_oldest_first = manga_mode and (manga_filename_style_for_sort_check != STYLE_DATE_POST_TITLE) and not target_post_id
api_base_url = f"https://{api_domain}/api/v1/{service}/user/{user_id}" should_fetch_all = fetch_all_first or is_manga_mode_fetch_all_and_sort_oldest_first
api_base_url = f"https://{api_domain}/api/v1/{service}/user/{user_id}/posts"
page_size = 50 page_size = 50
if is_manga_mode_fetch_all_and_sort_oldest_first: if is_manga_mode_fetch_all_and_sort_oldest_first:
logger(f" Manga Mode (Style: {manga_filename_style_for_sort_check if manga_filename_style_for_sort_check else 'Default'} - Oldest First Sort Active): Fetching all posts to sort by date...") logger(f" Manga Mode (Style: {manga_filename_style_for_sort_check if manga_filename_style_for_sort_check else 'Default'} - Oldest First Sort Active): Fetching all posts to sort by date...")

View File

@@ -0,0 +1,80 @@
import time
import requests
import json
from urllib.parse import urlparse
def fetch_server_channels(server_id, logger, cookies=None, cancellation_event=None, pause_event=None):
"""
Fetches the list of channels for a given Discord server ID from the Kemono API.
UPDATED to be pausable and cancellable.
"""
domains_to_try = ["kemono.cr", "kemono.su"]
for domain in domains_to_try:
if cancellation_event and cancellation_event.is_set():
logger(" Channel fetching cancelled by user.")
return None
while pause_event and pause_event.is_set():
if cancellation_event and cancellation_event.is_set(): break
time.sleep(0.5)
lookup_url = f"https://{domain}/api/v1/discord/channel/lookup/{server_id}"
logger(f" Attempting to fetch channel list from: {lookup_url}")
try:
response = requests.get(lookup_url, cookies=cookies, timeout=15)
response.raise_for_status()
channels = response.json()
if isinstance(channels, list):
logger(f" ✅ Found {len(channels)} channels for server {server_id}.")
return channels
except (requests.exceptions.RequestException, json.JSONDecodeError):
# This is a silent failure, we'll just try the next domain
pass
logger(f" ❌ Failed to fetch channel list for server {server_id} from all available domains.")
return None
def fetch_channel_messages(channel_id, logger, cancellation_event, pause_event, cookies=None):
"""
Fetches all messages from a Discord channel by looping through API pages (pagination).
Uses a page size of 150 and handles the specific offset logic.
"""
offset = 0
page_size = 150 # Corrected page size based on your findings
api_base_url = f"https://kemono.cr/api/v1/discord/channel/{channel_id}"
while not (cancellation_event and cancellation_event.is_set()):
if pause_event and pause_event.is_set():
logger(" Message fetching paused...")
while pause_event.is_set():
if cancellation_event and cancellation_event.is_set(): break
time.sleep(0.5)
logger(" Message fetching resumed.")
if cancellation_event and cancellation_event.is_set():
break
paginated_url = f"{api_base_url}?o={offset}"
logger(f" Fetching messages from API: page starting at offset {offset}")
try:
response = requests.get(paginated_url, cookies=cookies, timeout=20)
response.raise_for_status()
messages_batch = response.json()
if not messages_batch:
logger(f" ✅ Reached end of messages for channel {channel_id}.")
break
logger(f" Fetched {len(messages_batch)} messages...")
yield messages_batch
if len(messages_batch) < page_size:
logger(f" ✅ Last page of messages received for channel {channel_id}.")
break
offset += page_size
time.sleep(0.5)
except (requests.exceptions.RequestException, json.JSONDecodeError) as e:
logger(f" ❌ Error fetching messages at offset {offset}: {e}")
break

View File

@@ -37,7 +37,7 @@ try:
except ImportError: except ImportError:
Document = None Document = None
from PyQt5 .QtCore import Qt ,QThread ,pyqtSignal ,QMutex ,QMutexLocker ,QObject ,QTimer ,QSettings ,QStandardPaths ,QCoreApplication ,QUrl ,QSize ,QProcess from PyQt5 .QtCore import Qt ,QThread ,pyqtSignal ,QMutex ,QMutexLocker ,QObject ,QTimer ,QSettings ,QStandardPaths ,QCoreApplication ,QUrl ,QSize ,QProcess
from .api_client import download_from_api, fetch_post_comments from .api_client import download_from_api, fetch_post_comments, fetch_single_post_data
from ..services.multipart_downloader import download_file_in_parts, MULTIPART_DOWNLOADER_AVAILABLE from ..services.multipart_downloader import download_file_in_parts, MULTIPART_DOWNLOADER_AVAILABLE
from ..services.drive_downloader import ( from ..services.drive_downloader import (
download_mega_file, download_gdrive_file, download_dropbox_file download_mega_file, download_gdrive_file, download_dropbox_file
@@ -124,7 +124,8 @@ class PostProcessorWorker:
processed_post_ids=None, processed_post_ids=None,
multipart_scope='both', multipart_scope='both',
multipart_parts_count=4, multipart_parts_count=4,
multipart_min_size_mb=100 multipart_min_size_mb=100,
skip_file_size_mb=None
): ):
self.post = post_data self.post = post_data
self.download_root = download_root self.download_root = download_root
@@ -189,6 +190,7 @@ class PostProcessorWorker:
self.multipart_scope = multipart_scope self.multipart_scope = multipart_scope
self.multipart_parts_count = multipart_parts_count self.multipart_parts_count = multipart_parts_count
self.multipart_min_size_mb = multipart_min_size_mb self.multipart_min_size_mb = multipart_min_size_mb
self.skip_file_size_mb = skip_file_size_mb
if self.compress_images and Image is None: if self.compress_images and Image is None:
self.logger("⚠️ Image compression disabled: Pillow library not found.") self.logger("⚠️ Image compression disabled: Pillow library not found.")
self.compress_images = False self.compress_images = False
@@ -276,7 +278,25 @@ class PostProcessorWorker:
cookies_to_use_for_file = None cookies_to_use_for_file = None
if self.use_cookie: if self.use_cookie:
cookies_to_use_for_file = prepare_cookies_for_request(self.use_cookie, self.cookie_text, self.selected_cookie_file, self.app_base_dir, self.logger) cookies_to_use_for_file = prepare_cookies_for_request(self.use_cookie, self.cookie_text, self.selected_cookie_file, self.app_base_dir, self.logger)
if self.skip_file_size_mb is not None:
api_original_filename_for_size_check = file_info.get('_original_name_for_log', file_info.get('name'))
try:
# Use a stream=True HEAD request to get headers without downloading the body
with requests.head(file_url, headers=file_download_headers, timeout=15, cookies=cookies_to_use_for_file, allow_redirects=True) as head_response:
head_response.raise_for_status()
content_length = head_response.headers.get('Content-Length')
if content_length:
file_size_bytes = int(content_length)
file_size_mb = file_size_bytes / (1024 * 1024)
if file_size_mb < self.skip_file_size_mb:
self.logger(f" -> Skip File (Size): '{api_original_filename_for_size_check}' is {file_size_mb:.2f} MB, which is smaller than the {self.skip_file_size_mb} MB limit.")
return 0, 1, api_original_filename_for_size_check, False, FILE_DOWNLOAD_STATUS_SKIPPED, None
else:
self.logger(f" ⚠️ Could not determine file size for '{api_original_filename_for_size_check}' to check against size limit. Proceeding with download.")
except requests.RequestException as e:
self.logger(f" ⚠️ Could not fetch file headers to check size for '{api_original_filename_for_size_check}': {e}. Proceeding with download.")
api_original_filename = file_info.get('_original_name_for_log', file_info.get('name')) api_original_filename = file_info.get('_original_name_for_log', file_info.get('name'))
filename_to_save_in_main_path = "" filename_to_save_in_main_path = ""
if forced_filename_override: if forced_filename_override:
@@ -488,19 +508,18 @@ class PostProcessorWorker:
except requests.RequestException as e: except requests.RequestException as e:
self.logger(f" ⚠️ Could not verify size of existing file '{filename_to_save_in_main_path}': {e}. Proceeding with download.") self.logger(f" ⚠️ Could not verify size of existing file '{filename_to_save_in_main_path}': {e}. Proceeding with download.")
max_retries = 3
retry_delay = 5 retry_delay = 5
downloaded_size_bytes = 0 downloaded_size_bytes = 0
calculated_file_hash = None calculated_file_hash = None
downloaded_part_file_path = None downloaded_part_file_path = None
total_size_bytes = 0
download_successful_flag = False download_successful_flag = False
last_exception_for_retry_later = None last_exception_for_retry_later = None
is_permanent_error = False is_permanent_error = False
data_to_write_io = None data_to_write_io = None
response_for_this_attempt = None
for attempt_num_single_stream in range(max_retries + 1): for attempt_num_single_stream in range(max_retries + 1):
response_for_this_attempt = None response = None
if self._check_pause(f"File download attempt for '{api_original_filename}'"): break if self._check_pause(f"File download attempt for '{api_original_filename}'"): break
if self.check_cancel() or (skip_event and skip_event.is_set()): break if self.check_cancel() or (skip_event and skip_event.is_set()): break
try: try:
@@ -519,12 +538,24 @@ class PostProcessorWorker:
new_url = self._find_valid_subdomain(current_url_to_try) new_url = self._find_valid_subdomain(current_url_to_try)
if new_url != current_url_to_try: if new_url != current_url_to_try:
self.logger(f" Retrying with new URL: {new_url}") self.logger(f" Retrying with new URL: {new_url}")
file_url = new_url # Update the main file_url for subsequent retries file_url = new_url
response.close() # Close the old response
response = requests.get(new_url, headers=file_download_headers, timeout=(30, 300), stream=True, cookies=cookies_to_use_for_file) response = requests.get(new_url, headers=file_download_headers, timeout=(30, 300), stream=True, cookies=cookies_to_use_for_file)
response.raise_for_status() response.raise_for_status()
# --- REVISED AND MOVED SIZE CHECK LOGIC ---
total_size_bytes = int(response.headers.get('Content-Length', 0)) total_size_bytes = int(response.headers.get('Content-Length', 0))
if self.skip_file_size_mb is not None:
if total_size_bytes > 0:
file_size_mb = total_size_bytes / (1024 * 1024)
if file_size_mb < self.skip_file_size_mb:
self.logger(f" -> Skip File (Size): '{api_original_filename}' is {file_size_mb:.2f} MB, which is smaller than the {self.skip_file_size_mb} MB limit.")
return 0, 1, api_original_filename, False, FILE_DOWNLOAD_STATUS_SKIPPED, None
# If Content-Length is missing, we can't check, so we no longer log a warning here and just proceed.
# --- END OF REVISED LOGIC ---
num_parts_for_file = min(self.multipart_parts_count, MAX_PARTS_FOR_MULTIPART_DOWNLOAD) num_parts_for_file = min(self.multipart_parts_count, MAX_PARTS_FOR_MULTIPART_DOWNLOAD)
file_is_eligible_by_scope = False file_is_eligible_by_scope = False
@@ -548,9 +579,7 @@ class PostProcessorWorker:
if self._check_pause(f"Multipart decision for '{api_original_filename}'"): break if self._check_pause(f"Multipart decision for '{api_original_filename}'"): break
if attempt_multipart: if attempt_multipart:
if response_for_this_attempt: response.close() # Close the initial connection before starting multipart
response_for_this_attempt.close()
response_for_this_attempt = None
mp_save_path_for_unique_part_stem_arg = os.path.join(target_folder_path, f"{unique_part_file_stem_on_disk}{temp_file_ext_for_unique_part}") mp_save_path_for_unique_part_stem_arg = os.path.join(target_folder_path, f"{unique_part_file_stem_on_disk}{temp_file_ext_for_unique_part}")
mp_success, mp_bytes, mp_hash, mp_file_handle = download_file_in_parts( mp_success, mp_bytes, mp_hash, mp_file_handle = download_file_in_parts(
file_url, mp_save_path_for_unique_part_stem_arg, total_size_bytes, num_parts_for_file, file_download_headers, api_original_filename, file_url, mp_save_path_for_unique_part_stem_arg, total_size_bytes, num_parts_for_file, file_download_headers, api_original_filename,
@@ -576,7 +605,6 @@ class PostProcessorWorker:
current_attempt_downloaded_bytes = 0 current_attempt_downloaded_bytes = 0
md5_hasher = hashlib.md5() md5_hasher = hashlib.md5()
last_progress_time = time.time() last_progress_time = time.time()
single_stream_exception = None
try: try:
with open(current_single_stream_part_path, 'wb') as f_part: with open(current_single_stream_part_path, 'wb') as f_part:
for chunk in response.iter_content(chunk_size=1 * 1024 * 1024): for chunk in response.iter_content(chunk_size=1 * 1024 * 1024):
@@ -643,8 +671,8 @@ class PostProcessorWorker:
is_permanent_error = True is_permanent_error = True
break break
finally: finally:
if response_for_this_attempt: if response:
response_for_this_attempt.close() response.close()
self._emit_signal('file_download_status', False) self._emit_signal('file_download_status', False)
final_total_for_progress = total_size_bytes if download_successful_flag and total_size_bytes > 0 else downloaded_size_bytes final_total_for_progress = total_size_bytes if download_successful_flag and total_size_bytes > 0 else downloaded_size_bytes
@@ -826,37 +854,91 @@ class PostProcessorWorker:
return 0, 1, filename_to_save_in_main_path, was_original_name_kept_flag, FILE_DOWNLOAD_STATUS_FAILED_RETRYABLE_LATER, details_for_failure return 0, 1, filename_to_save_in_main_path, was_original_name_kept_flag, FILE_DOWNLOAD_STATUS_FAILED_RETRYABLE_LATER, details_for_failure
def process(self): def process(self):
# --- START: REFACTORED PROCESS METHOD ---
# 1. DATA MAPPING: Map Discord Message or Creator Post fields to a consistent set of variables.
if self.service == 'discord':
# For Discord, self.post is a MESSAGE object from the API.
post_title = self.post.get('content', '') or f"Message {self.post.get('id', 'N/A')}"
post_id = self.post.get('id', 'unknown_id')
post_main_file_info = {} # Discord messages don't have a single main file
post_attachments = self.post.get('attachments', [])
post_content_html = self.post.get('content', '')
post_data = self.post # Keep a reference to the original message object
log_prefix = "Message"
else:
# Existing logic for standard creator posts
post_title = self.post.get('title', '') or 'untitled_post'
post_id = self.post.get('id', 'unknown_id')
post_main_file_info = self.post.get('file')
post_attachments = self.post.get('attachments', [])
post_content_html = self.post.get('content', '')
post_data = self.post # Reference to the post object
log_prefix = "Post"
# --- FIX: FETCH FULL POST DATA IF CONTENT IS MISSING BUT NEEDED ---
content_is_needed = (
self.show_external_links or
self.extract_links_only or
self.scan_content_for_images or
(self.filter_mode == 'text_only' and self.text_only_scope == 'content')
)
if content_is_needed and self.post.get('content') is None and self.service != 'discord':
self.logger(f" Post {post_id} is missing 'content' field, fetching full data...")
parsed_url = urlparse(self.api_url_input)
api_domain = parsed_url.netloc
headers = {'User-Agent': 'Mozilla/5.0'}
cookies = prepare_cookies_for_request(self.use_cookie, self.cookie_text, self.selected_cookie_file, self.app_base_dir, self.logger, target_domain=api_domain)
full_post_data = fetch_single_post_data(api_domain, self.service, self.user_id, post_id, headers, self.logger, cookies_dict=cookies)
if full_post_data:
self.logger(" ✅ Full post data fetched successfully.")
# Update the worker's post object with the complete data
self.post = full_post_data
# Re-initialize local variables from the new, complete post data
post_title = self.post.get('title', '') or 'untitled_post'
post_main_file_info = self.post.get('file')
post_attachments = self.post.get('attachments', [])
post_content_html = self.post.get('content', '')
post_data = self.post
else:
self.logger(f" ⚠️ Failed to fetch full content for post {post_id}. Content-dependent features may not work for this post.")
# --- END FIX ---
# 2. SHARED PROCESSING LOGIC: The rest of the function now uses the consistent variables from above.
result_tuple = (0, 0, [], [], [], None, None) result_tuple = (0, 0, [], [], [], None, None)
total_downloaded_this_post = 0
total_skipped_this_post = 0
determined_post_save_path_for_history = self.override_output_dir if self.override_output_dir else self.download_root
try: try:
if self._check_pause(f"Post processing for ID {self.post.get('id', 'N/A')}"): if self._check_pause(f"{log_prefix} processing for ID {post_id}"):
result_tuple = (0, 0, [], [], [], None, None) return (0, 0, [], [], [], None, None)
return result_tuple
if self.check_cancel(): if self.check_cancel():
result_tuple = (0, 0, [], [], [], None, None) return (0, 0, [], [], [], None, None)
return result_tuple
current_character_filters = self._get_current_character_filters() current_character_filters = self._get_current_character_filters()
kept_original_filenames_for_log = [] kept_original_filenames_for_log = []
retryable_failures_this_post = [] retryable_failures_this_post = []
permanent_failures_this_post = [] permanent_failures_this_post = []
total_downloaded_this_post = 0
total_skipped_this_post = 0
history_data_for_this_post = None history_data_for_this_post = None
parsed_api_url = urlparse(self.api_url_input) parsed_api_url = urlparse(self.api_url_input)
post_data = self.post
post_id = post_data.get('id', 'unknown_id') # CONTEXT-AWARE URL for Referer Header
if self.service == 'discord':
server_id = self.user_id
channel_id = self.post.get('channel', 'unknown_channel')
post_page_url = f"https://{parsed_api_url.netloc}/discord/server/{server_id}/{channel_id}"
else:
post_page_url = f"https://{parsed_api_url.netloc}/{self.service}/user/{self.user_id}/post/{post_id}"
post_page_url = f"https://{parsed_api_url.netloc}/{self.service}/user/{self.user_id}/post/{post_id}"
headers = {'User-Agent': 'Mozilla/5.0', 'Referer': post_page_url, 'Accept': '*/*'} headers = {'User-Agent': 'Mozilla/5.0', 'Referer': post_page_url, 'Accept': '*/*'}
link_pattern = re.compile(r"""<a\s+.*?href=["'](https?://[^"']+)["'][^>]*>(.*?)</a>""", re.IGNORECASE | re.DOTALL) link_pattern = re.compile(r"""<a\s+.*?href=["'](https?://[^"']+)["'][^>]*>(.*?)</a>""", re.IGNORECASE | re.DOTALL)
post_data = self.post
post_title = post_data.get('title', '') or 'untitled_post'
post_id = post_data.get('id', 'unknown_id')
post_main_file_info = post_data.get('file')
post_attachments = post_data.get('attachments', [])
effective_unwanted_keywords_for_folder_naming = self.unwanted_keywords.copy() effective_unwanted_keywords_for_folder_naming = self.unwanted_keywords.copy()
is_full_creator_download_no_char_filter = not self.target_post_id_from_initial_url and not current_character_filters is_full_creator_download_no_char_filter = not self.target_post_id_from_initial_url and not current_character_filters
@@ -874,9 +956,9 @@ class PostProcessorWorker:
self.logger(f" Applying creator download specific folder ignore words ({len(self.creator_download_folder_ignore_words)} words).") self.logger(f" Applying creator download specific folder ignore words ({len(self.creator_download_folder_ignore_words)} words).")
effective_unwanted_keywords_for_folder_naming.update(self.creator_download_folder_ignore_words) effective_unwanted_keywords_for_folder_naming.update(self.creator_download_folder_ignore_words)
post_content_html = post_data.get('content', '')
if not self.extract_links_only: if not self.extract_links_only:
self.logger(f"\n--- Processing Post {post_id} ('{post_title[:50]}...') (Thread: {threading.current_thread().name}) ---") self.logger(f"\n--- Processing {log_prefix} {post_id} ('{post_title[:50]}...') (Thread: {threading.current_thread().name}) ---")
num_potential_files_in_post = len(post_attachments or []) + (1 if post_main_file_info and post_main_file_info.get('path') else 0) num_potential_files_in_post = len(post_attachments or []) + (1 if post_main_file_info and post_main_file_info.get('path') else 0)
post_is_candidate_by_title_char_match = False post_is_candidate_by_title_char_match = False
@@ -920,7 +1002,7 @@ class PostProcessorWorker:
if original_api_att_name: if original_api_att_name:
all_files_from_post_api_for_char_check.append({'_original_name_for_log': original_api_att_name}) all_files_from_post_api_for_char_check.append({'_original_name_for_log': original_api_att_name})
if current_character_filters and self.char_filter_scope == CHAR_SCOPE_COMMENTS: if current_character_filters and self.char_filter_scope == CHAR_SCOPE_COMMENTS and self.service != 'discord':
self.logger(f" [Char Scope: Comments] Phase 1: Checking post files for matches before comments for post ID '{post_id}'.") self.logger(f" [Char Scope: Comments] Phase 1: Checking post files for matches before comments for post ID '{post_id}'.")
if self._check_pause(f"File check (comments scope) for post {post_id}"): if self._check_pause(f"File check (comments scope) for post {post_id}"):
result_tuple = (0, num_potential_files_in_post, [], [], [], None, None) result_tuple = (0, num_potential_files_in_post, [], [], [], None, None)
@@ -943,7 +1025,7 @@ class PostProcessorWorker:
if post_is_candidate_by_file_char_match_in_comment_scope: break if post_is_candidate_by_file_char_match_in_comment_scope: break
self.logger(f" [Char Scope: Comments] Phase 1 Result: post_is_candidate_by_file_char_match_in_comment_scope = {post_is_candidate_by_file_char_match_in_comment_scope}") self.logger(f" [Char Scope: Comments] Phase 1 Result: post_is_candidate_by_file_char_match_in_comment_scope = {post_is_candidate_by_file_char_match_in_comment_scope}")
if current_character_filters and self.char_filter_scope == CHAR_SCOPE_COMMENTS: if current_character_filters and self.char_filter_scope == CHAR_SCOPE_COMMENTS and self.service != 'discord':
if not post_is_candidate_by_file_char_match_in_comment_scope: if not post_is_candidate_by_file_char_match_in_comment_scope:
if self._check_pause(f"Comment check for post {post_id}"): if self._check_pause(f"Comment check for post {post_id}"):
result_tuple = (0, num_potential_files_in_post, [], [], [], None, None) result_tuple = (0, num_potential_files_in_post, [], [], [], None, None)
@@ -1007,10 +1089,10 @@ class PostProcessorWorker:
return result_tuple return result_tuple
if not self.extract_links_only and self.manga_mode_active and current_character_filters and (self.char_filter_scope == CHAR_SCOPE_TITLE or self.char_filter_scope == CHAR_SCOPE_BOTH) and not post_is_candidate_by_title_char_match: if not self.extract_links_only and self.manga_mode_active and current_character_filters and (self.char_filter_scope == CHAR_SCOPE_TITLE or self.char_filter_scope == CHAR_SCOPE_BOTH) and not post_is_candidate_by_title_char_match:
self.logger(f" -> Skip Post (Manga Mode with Title/Both Scope - No Title Char Match): Title '{post_title[:50]}' doesn't match filters.") self.logger(f" -> Skip Post (Manga Mode with Title/Both Scope - No Title Char Match): Title '{post_title[:50]}' doesn't match filters.")
self._emit_signal('missed_character_post', post_title, "Manga Mode: No title match for character filter (Title/Both scope)") self._emit_signal('missed_character_post', post_title, "Manga Mode: No title match for character filter (Title/Both scope)")
result_tuple = (0, num_potential_files_in_post, [], [], [], None, None) result_tuple = (0, num_potential_files_in_post, [], [], [], None, None)
return result_tuple return result_tuple
if not isinstance(post_attachments, list): if not isinstance(post_attachments, list):
self.logger(f"⚠️ Corrupt attachment data for post {post_id} (expected list, got {type(post_attachments)}). Skipping attachments.") self.logger(f"⚠️ Corrupt attachment data for post {post_id} (expected list, got {type(post_attachments)}). Skipping attachments.")
@@ -1143,29 +1225,50 @@ class PostProcessorWorker:
suffix_counter = 0 suffix_counter = 0
final_post_subfolder_name = "" final_post_subfolder_name = ""
while True: suffix_counter = 0
folder_creation_successful = False
final_post_subfolder_name = ""
post_id_for_folder = str(self.post.get('id', 'unknown_id'))
while not folder_creation_successful:
if suffix_counter == 0: if suffix_counter == 0:
name_candidate = original_cleaned_post_title_for_sub name_candidate = original_cleaned_post_title_for_sub
else: else:
name_candidate = f"{original_cleaned_post_title_for_sub}_{suffix_counter}" name_candidate = f"{original_cleaned_post_title_for_sub}_{suffix_counter}"
potential_post_subfolder_path = os.path.join(base_path_for_post_subfolder, name_candidate) potential_post_subfolder_path = os.path.join(base_path_for_post_subfolder, name_candidate)
try: id_file_path = os.path.join(potential_post_subfolder_path, f".postid_{post_id_for_folder}")
os.makedirs(potential_post_subfolder_path, exist_ok=False)
final_post_subfolder_name = name_candidate if not os.path.isdir(potential_post_subfolder_path):
if suffix_counter > 0: # Folder does not exist, create it and its ID file
self.logger(f" Post subfolder name conflict: Using '{final_post_subfolder_name}' instead of '{original_cleaned_post_title_for_sub}' to avoid mixing posts.") try:
break os.makedirs(potential_post_subfolder_path)
except FileExistsError: with open(id_file_path, 'w') as f:
suffix_counter += 1 f.write(post_id_for_folder)
if suffix_counter > 100:
self.logger(f" ⚠️ Exceeded 100 attempts to find unique subfolder name for '{original_cleaned_post_title_for_sub}'. Using UUID.") final_post_subfolder_name = name_candidate
final_post_subfolder_name = f"{original_cleaned_post_title_for_sub}_{uuid.uuid4().hex[:8]}" folder_creation_successful = True
os.makedirs(os.path.join(base_path_for_post_subfolder, final_post_subfolder_name), exist_ok=True) if suffix_counter > 0:
self.logger(f" Post subfolder name conflict: Using '{final_post_subfolder_name}' to avoid mixing posts.")
except OSError as e_mkdir:
self.logger(f" ❌ Error creating directory '{potential_post_subfolder_path}': {e_mkdir}.")
final_post_subfolder_name = original_cleaned_post_title_for_sub
break break
except OSError as e_mkdir: else:
self.logger(f" ❌ Error creating directory '{potential_post_subfolder_path}': {e_mkdir}. Files for this post might be saved in parent or fail.") # Folder exists, check if it's for this post or a different one
final_post_subfolder_name = original_cleaned_post_title_for_sub if os.path.exists(id_file_path):
break # ID file matches! This is a restore scenario. Reuse the folder.
self.logger(f" Re-using existing post subfolder: '{name_candidate}'")
final_post_subfolder_name = name_candidate
folder_creation_successful = True
else:
# Folder exists but ID file does not match (or is missing). This is a normal name collision.
suffix_counter += 1
if suffix_counter > 100: # Safety break
self.logger(f" ⚠️ Exceeded 100 attempts to find unique subfolder for '{original_cleaned_post_title_for_sub}'.")
final_post_subfolder_name = f"{original_cleaned_post_title_for_sub}_{uuid.uuid4().hex[:8]}"
os.makedirs(os.path.join(base_path_for_post_subfolder, final_post_subfolder_name), exist_ok=True)
break
determined_post_save_path_for_history = os.path.join(base_path_for_post_subfolder, final_post_subfolder_name) determined_post_save_path_for_history = os.path.join(base_path_for_post_subfolder, final_post_subfolder_name)
if self.skip_words_list and (self.skip_words_scope == SKIP_SCOPE_POSTS or self.skip_words_scope == SKIP_SCOPE_BOTH): if self.skip_words_list and (self.skip_words_scope == SKIP_SCOPE_POSTS or self.skip_words_scope == SKIP_SCOPE_BOTH):
@@ -1214,7 +1317,6 @@ class PostProcessorWorker:
parsed_url = urlparse(self.api_url_input) parsed_url = urlparse(self.api_url_input)
api_domain = parsed_url.netloc api_domain = parsed_url.netloc
cookies = prepare_cookies_for_request(self.use_cookie, self.cookie_text, self.selected_cookie_file, self.app_base_dir, self.logger, target_domain=api_domain) cookies = prepare_cookies_for_request(self.use_cookie, self.cookie_text, self.selected_cookie_file, self.app_base_dir, self.logger, target_domain=api_domain)
from .api_client import fetch_single_post_data
full_data = fetch_single_post_data(api_domain, self.service, self.user_id, post_id, headers, self.logger, cookies_dict=cookies) full_data = fetch_single_post_data(api_domain, self.service, self.user_id, post_id, headers, self.logger, cookies_dict=cookies)
if full_data: if full_data:
final_post_data = full_data final_post_data = full_data
@@ -1807,14 +1909,23 @@ class PostProcessorWorker:
permanent_failures_this_post, history_data_for_this_post, permanent_failures_this_post, history_data_for_this_post,
None) None)
except Exception as main_thread_err:
self.logger(f"\n❌ Critical error within Worker process for {log_prefix} {post_id}: {main_thread_err}")
self.logger(traceback.format_exc())
# Ensure we still return a valid tuple to prevent the app from stalling
result_tuple = (0, 1, [], [], [{'error': str(main_thread_err)}], None, None)
finally: finally:
# This block ALWAYS executes, ensuring that every task signals its completion.
# This is critical for the main thread to know when all work is done.
if not self.extract_links_only and self.use_post_subfolders and total_downloaded_this_post == 0: if not self.extract_links_only and self.use_post_subfolders and total_downloaded_this_post == 0:
path_to_check_for_emptiness = determined_post_save_path_for_history path_to_check_for_emptiness = determined_post_save_path_for_history
try: try:
# Check if the path is a directory and if it's empty
if os.path.isdir(path_to_check_for_emptiness) and not os.listdir(path_to_check_for_emptiness): if os.path.isdir(path_to_check_for_emptiness) and not os.listdir(path_to_check_for_emptiness):
self.logger(f" 🗑️ Removing empty post-specific subfolder: '{path_to_check_for_emptiness}'") self.logger(f" 🗑️ Removing empty post-specific subfolder: '{path_to_check_for_emptiness}'")
os.rmdir(path_to_check_for_emptiness) os.rmdir(path_to_check_for_emptiness)
except OSError as e_rmdir: except OSError as e_rmdir:
# Log if removal fails for any reason (e.g., permissions)
self.logger(f" ⚠️ Could not remove potentially empty subfolder '{path_to_check_for_emptiness}': {e_rmdir}") self.logger(f" ⚠️ Could not remove potentially empty subfolder '{path_to_check_for_emptiness}': {e_rmdir}")
self._emit_signal('worker_finished', result_tuple) self._emit_signal('worker_finished', result_tuple)
@@ -1881,7 +1992,10 @@ class DownloadThread(QThread):
single_pdf_mode=False, single_pdf_mode=False,
project_root_dir=None, project_root_dir=None,
processed_post_ids=None, processed_post_ids=None,
start_offset=0): start_offset=0,
fetch_first=False,
skip_file_size_mb=None
):
super().__init__() super().__init__()
self.api_url_input = api_url_input self.api_url_input = api_url_input
self.output_dir = output_dir self.output_dir = output_dir
@@ -1947,6 +2061,8 @@ class DownloadThread(QThread):
self.project_root_dir = project_root_dir self.project_root_dir = project_root_dir
self.processed_post_ids_set = set(processed_post_ids) if processed_post_ids is not None else set() self.processed_post_ids_set = set(processed_post_ids) if processed_post_ids is not None else set()
self.start_offset = start_offset self.start_offset = start_offset
self.fetch_first = fetch_first
self.skip_file_size_mb = skip_file_size_mb
if self.compress_images and Image is None: if self.compress_images and Image is None:
self.logger("⚠️ Image compression disabled: Pillow library not found (DownloadThread).") self.logger("⚠️ Image compression disabled: Pillow library not found (DownloadThread).")
@@ -1993,7 +2109,8 @@ class DownloadThread(QThread):
selected_cookie_file=self.selected_cookie_file, selected_cookie_file=self.selected_cookie_file,
app_base_dir=self.app_base_dir, app_base_dir=self.app_base_dir,
manga_filename_style_for_sort_check=self.manga_filename_style if self.manga_mode_active else None, manga_filename_style_for_sort_check=self.manga_filename_style if self.manga_mode_active else None,
processed_post_ids=self.processed_post_ids_set processed_post_ids=self.processed_post_ids_set,
fetch_all_first=self.fetch_first
) )
for posts_batch_data in post_generator: for posts_batch_data in post_generator:
@@ -2066,6 +2183,7 @@ class DownloadThread(QThread):
'single_pdf_mode': self.single_pdf_mode, 'single_pdf_mode': self.single_pdf_mode,
'multipart_parts_count': self.multipart_parts_count, 'multipart_parts_count': self.multipart_parts_count,
'multipart_min_size_mb': self.multipart_min_size_mb, 'multipart_min_size_mb': self.multipart_min_size_mb,
'skip_file_size_mb': self.skip_file_size_mb,
'project_root_dir': self.project_root_dir, 'project_root_dir': self.project_root_dir,
} }

View File

@@ -7,8 +7,6 @@ import base64
import time import time
from urllib.parse import urlparse, urlunparse, parse_qs, urlencode from urllib.parse import urlparse, urlunparse, parse_qs, urlencode
# --- Third-Party Library Imports ---
# Make sure to install these: pip install requests pycryptodome gdown
import requests import requests
try: try:
@@ -23,11 +21,8 @@ try:
except ImportError: except ImportError:
GDRIVE_AVAILABLE = False GDRIVE_AVAILABLE = False
# --- Constants ---
MEGA_API_URL = "https://g.api.mega.co.nz" MEGA_API_URL = "https://g.api.mega.co.nz"
# --- Helper Functions (Original and New) ---
def _get_filename_from_headers(headers): def _get_filename_from_headers(headers):
""" """
Extracts a filename from the Content-Disposition header. Extracts a filename from the Content-Disposition header.

View File

@@ -16,7 +16,8 @@ from ..main_window import get_app_icon_object
from ...config.constants import ( from ...config.constants import (
THEME_KEY, LANGUAGE_KEY, DOWNLOAD_LOCATION_KEY, THEME_KEY, LANGUAGE_KEY, DOWNLOAD_LOCATION_KEY,
RESOLUTION_KEY, UI_SCALE_KEY, SAVE_CREATOR_JSON_KEY, RESOLUTION_KEY, UI_SCALE_KEY, SAVE_CREATOR_JSON_KEY,
COOKIE_TEXT_KEY, USE_COOKIE_KEY COOKIE_TEXT_KEY, USE_COOKIE_KEY,
FETCH_FIRST_KEY ### ADDED ###
) )
@@ -36,7 +37,7 @@ class FutureSettingsDialog(QDialog):
screen_height = QApplication.primaryScreen().availableGeometry().height() if QApplication.primaryScreen() else 800 screen_height = QApplication.primaryScreen().availableGeometry().height() if QApplication.primaryScreen() else 800
scale_factor = screen_height / 800.0 scale_factor = screen_height / 800.0
base_min_w, base_min_h = 420, 360 # Adjusted height for new layout base_min_w, base_min_h = 420, 390
scaled_min_w = int(base_min_w * scale_factor) scaled_min_w = int(base_min_w * scale_factor)
scaled_min_h = int(base_min_h * scale_factor) scaled_min_h = int(base_min_h * scale_factor)
self.setMinimumSize(scaled_min_w, scaled_min_h) self.setMinimumSize(scaled_min_w, scaled_min_h)
@@ -49,7 +50,6 @@ class FutureSettingsDialog(QDialog):
"""Initializes all UI components and layouts for the dialog.""" """Initializes all UI components and layouts for the dialog."""
main_layout = QVBoxLayout(self) main_layout = QVBoxLayout(self)
# --- Group 1: Interface Settings ---
self.interface_group_box = QGroupBox() self.interface_group_box = QGroupBox()
interface_layout = QGridLayout(self.interface_group_box) interface_layout = QGridLayout(self.interface_group_box)
@@ -76,36 +76,32 @@ class FutureSettingsDialog(QDialog):
main_layout.addWidget(self.interface_group_box) main_layout.addWidget(self.interface_group_box)
# --- Group 2: Download & Window Settings ---
self.download_window_group_box = QGroupBox() self.download_window_group_box = QGroupBox()
download_window_layout = QGridLayout(self.download_window_group_box) download_window_layout = QGridLayout(self.download_window_group_box)
# Window Size (Resolution)
self.window_size_label = QLabel() self.window_size_label = QLabel()
self.resolution_combo_box = QComboBox() self.resolution_combo_box = QComboBox()
self.resolution_combo_box.currentIndexChanged.connect(self._display_setting_changed) self.resolution_combo_box.currentIndexChanged.connect(self._display_setting_changed)
download_window_layout.addWidget(self.window_size_label, 0, 0) download_window_layout.addWidget(self.window_size_label, 0, 0)
download_window_layout.addWidget(self.resolution_combo_box, 0, 1) download_window_layout.addWidget(self.resolution_combo_box, 0, 1)
# Default Path
self.default_path_label = QLabel() self.default_path_label = QLabel()
self.save_path_button = QPushButton() self.save_path_button = QPushButton()
# --- START: MODIFIED LOGIC ---
self.save_path_button.clicked.connect(self._save_cookie_and_path) self.save_path_button.clicked.connect(self._save_cookie_and_path)
# --- END: MODIFIED LOGIC ---
download_window_layout.addWidget(self.default_path_label, 1, 0) download_window_layout.addWidget(self.default_path_label, 1, 0)
download_window_layout.addWidget(self.save_path_button, 1, 1) download_window_layout.addWidget(self.save_path_button, 1, 1)
# Save Creator.json Checkbox
self.save_creator_json_checkbox = QCheckBox() self.save_creator_json_checkbox = QCheckBox()
self.save_creator_json_checkbox.stateChanged.connect(self._creator_json_setting_changed) self.save_creator_json_checkbox.stateChanged.connect(self._creator_json_setting_changed)
download_window_layout.addWidget(self.save_creator_json_checkbox, 2, 0, 1, 2) download_window_layout.addWidget(self.save_creator_json_checkbox, 2, 0, 1, 2)
self.fetch_first_checkbox = QCheckBox()
self.fetch_first_checkbox.stateChanged.connect(self._fetch_first_setting_changed)
download_window_layout.addWidget(self.fetch_first_checkbox, 3, 0, 1, 2)
main_layout.addWidget(self.download_window_group_box) main_layout.addWidget(self.download_window_group_box)
main_layout.addStretch(1) main_layout.addStretch(1)
# --- OK Button ---
self.ok_button = QPushButton() self.ok_button = QPushButton()
self.ok_button.clicked.connect(self.accept) self.ok_button.clicked.connect(self.accept)
main_layout.addWidget(self.ok_button, 0, Qt.AlignRight | Qt.AlignBottom) main_layout.addWidget(self.ok_button, 0, Qt.AlignRight | Qt.AlignBottom)
@@ -113,17 +109,27 @@ class FutureSettingsDialog(QDialog):
def _load_checkbox_states(self): def _load_checkbox_states(self):
"""Loads the initial state for all checkboxes from settings.""" """Loads the initial state for all checkboxes from settings."""
self.save_creator_json_checkbox.blockSignals(True) self.save_creator_json_checkbox.blockSignals(True)
# Default to True so the feature is on by default for users
should_save = self.parent_app.settings.value(SAVE_CREATOR_JSON_KEY, True, type=bool) should_save = self.parent_app.settings.value(SAVE_CREATOR_JSON_KEY, True, type=bool)
self.save_creator_json_checkbox.setChecked(should_save) self.save_creator_json_checkbox.setChecked(should_save)
self.save_creator_json_checkbox.blockSignals(False) self.save_creator_json_checkbox.blockSignals(False)
self.fetch_first_checkbox.blockSignals(True)
should_fetch_first = self.parent_app.settings.value(FETCH_FIRST_KEY, False, type=bool)
self.fetch_first_checkbox.setChecked(should_fetch_first)
self.fetch_first_checkbox.blockSignals(False)
def _creator_json_setting_changed(self, state): def _creator_json_setting_changed(self, state):
"""Saves the state of the 'Save Creator.json' checkbox.""" """Saves the state of the 'Save Creator.json' checkbox."""
is_checked = state == Qt.Checked is_checked = state == Qt.Checked
self.parent_app.settings.setValue(SAVE_CREATOR_JSON_KEY, is_checked) self.parent_app.settings.setValue(SAVE_CREATOR_JSON_KEY, is_checked)
self.parent_app.settings.sync() self.parent_app.settings.sync()
def _fetch_first_setting_changed(self, state):
"""Saves the state of the 'Fetch First' checkbox."""
is_checked = state == Qt.Checked
self.parent_app.settings.setValue(FETCH_FIRST_KEY, is_checked)
self.parent_app.settings.sync()
def _tr(self, key, default_text=""): def _tr(self, key, default_text=""):
if callable(get_translation) and self.parent_app: if callable(get_translation) and self.parent_app:
return get_translation(self.parent_app.current_selected_language, key, default_text) return get_translation(self.parent_app.current_selected_language, key, default_text)
@@ -132,33 +138,31 @@ class FutureSettingsDialog(QDialog):
def _retranslate_ui(self): def _retranslate_ui(self):
self.setWindowTitle(self._tr("settings_dialog_title", "Settings")) self.setWindowTitle(self._tr("settings_dialog_title", "Settings"))
# Group Box Titles
self.interface_group_box.setTitle(self._tr("interface_group_title", "Interface Settings")) self.interface_group_box.setTitle(self._tr("interface_group_title", "Interface Settings"))
self.download_window_group_box.setTitle(self._tr("download_window_group_title", "Download & Window Settings")) self.download_window_group_box.setTitle(self._tr("download_window_group_title", "Download & Window Settings"))
# Interface Group Labels
self.theme_label.setText(self._tr("theme_label", "Theme:")) self.theme_label.setText(self._tr("theme_label", "Theme:"))
self.ui_scale_label.setText(self._tr("ui_scale_label", "UI Scale:")) self.ui_scale_label.setText(self._tr("ui_scale_label", "UI Scale:"))
self.language_label.setText(self._tr("language_label", "Language:")) self.language_label.setText(self._tr("language_label", "Language:"))
# Download & Window Group Labels
self.window_size_label.setText(self._tr("window_size_label", "Window Size:")) self.window_size_label.setText(self._tr("window_size_label", "Window Size:"))
self.default_path_label.setText(self._tr("default_path_label", "Default Path:")) self.default_path_label.setText(self._tr("default_path_label", "Default Path:"))
self.save_creator_json_checkbox.setText(self._tr("save_creator_json_label", "Save Creator.json file")) self.save_creator_json_checkbox.setText(self._tr("save_creator_json_label", "Save Creator.json file"))
# --- START: MODIFIED LOGIC --- self.fetch_first_checkbox.setText(self._tr("fetch_first_label", "Fetch First (Download after all pages are found)"))
# Buttons and Controls self.fetch_first_checkbox.setToolTip(self._tr("fetch_first_tooltip", "If checked, the downloader will find all posts from a creator first before starting any downloads.\nThis can be slower to start but provides a more accurate progress bar."))
self._update_theme_toggle_button_text() self._update_theme_toggle_button_text()
self.save_path_button.setText(self._tr("settings_save_cookie_path_button", "Save Cookie + Download Path")) self.save_path_button.setText(self._tr("settings_save_cookie_path_button", "Save Cookie + Download Path"))
self.save_path_button.setToolTip(self._tr("settings_save_cookie_path_tooltip", "Save the current 'Download Location' and Cookie settings for future sessions.")) self.save_path_button.setToolTip(self._tr("settings_save_cookie_path_tooltip", "Save the current 'Download Location' and Cookie settings for future sessions."))
self.ok_button.setText(self._tr("ok_button", "OK")) self.ok_button.setText(self._tr("ok_button", "OK"))
# --- END: MODIFIED LOGIC ---
# Populate dropdowns
self._populate_display_combo_boxes() self._populate_display_combo_boxes()
self._populate_language_combo_box() self._populate_language_combo_box()
self._load_checkbox_states() self._load_checkbox_states()
# --- (The rest of the file remains unchanged) ---
def _apply_theme(self): def _apply_theme(self):
if self.parent_app and self.parent_app.current_theme == "dark": if self.parent_app and self.parent_app.current_theme == "dark":
scale = getattr(self.parent_app, 'scale_factor', 1) scale = getattr(self.parent_app, 'scale_factor', 1)
@@ -285,14 +289,12 @@ class FutureSettingsDialog(QDialog):
path_saved = False path_saved = False
cookie_saved = False cookie_saved = False
# --- Save Download Path Logic ---
if hasattr(self.parent_app, 'dir_input') and self.parent_app.dir_input: if hasattr(self.parent_app, 'dir_input') and self.parent_app.dir_input:
current_path = self.parent_app.dir_input.text().strip() current_path = self.parent_app.dir_input.text().strip()
if current_path and os.path.isdir(current_path): if current_path and os.path.isdir(current_path):
self.parent_app.settings.setValue(DOWNLOAD_LOCATION_KEY, current_path) self.parent_app.settings.setValue(DOWNLOAD_LOCATION_KEY, current_path)
path_saved = True path_saved = True
# --- Save Cookie Logic ---
if hasattr(self.parent_app, 'use_cookie_checkbox'): if hasattr(self.parent_app, 'use_cookie_checkbox'):
use_cookie = self.parent_app.use_cookie_checkbox.isChecked() use_cookie = self.parent_app.use_cookie_checkbox.isChecked()
cookie_content = self.parent_app.cookie_text_input.text().strip() cookie_content = self.parent_app.cookie_text_input.text().strip()
@@ -301,7 +303,7 @@ class FutureSettingsDialog(QDialog):
self.parent_app.settings.setValue(USE_COOKIE_KEY, True) self.parent_app.settings.setValue(USE_COOKIE_KEY, True)
self.parent_app.settings.setValue(COOKIE_TEXT_KEY, cookie_content) self.parent_app.settings.setValue(COOKIE_TEXT_KEY, cookie_content)
cookie_saved = True cookie_saved = True
else: # Also save the 'off' state else:
self.parent_app.settings.setValue(USE_COOKIE_KEY, False) self.parent_app.settings.setValue(USE_COOKIE_KEY, False)
self.parent_app.settings.setValue(COOKIE_TEXT_KEY, "") self.parent_app.settings.setValue(COOKIE_TEXT_KEY, "")
@@ -319,4 +321,4 @@ class FutureSettingsDialog(QDialog):
self._tr("settings_save_nothing_message", "The download location is not a valid directory and no cookie was active.")) self._tr("settings_save_nothing_message", "The download location is not a valid directory and no cookie was active."))
return return
QMessageBox.information(self, self._tr("settings_save_success_title", "Settings Saved"), message) QMessageBox.information(self, self._tr("settings_save_success_title", "Settings Saved"), message)

View File

@@ -0,0 +1,146 @@
import os
import re
import datetime
try:
from fpdf import FPDF
FPDF_AVAILABLE = True
class PDF(FPDF):
"""Custom PDF class for Discord chat logs."""
def __init__(self, server_name, channel_name, *args, **kwargs):
super().__init__(*args, **kwargs)
self.server_name = server_name
self.channel_name = channel_name
self.default_font_family = 'DejaVu' # Can be changed to Arial if font fails
def header(self):
if self.page_no() == 1:
return # No header on the title page
self.set_font(self.default_font_family, '', 8)
self.cell(0, 10, f'{self.server_name} - #{self.channel_name}', 0, 0, 'L')
self.cell(0, 10, 'Page ' + str(self.page_no()), 0, 0, 'R')
self.ln(10)
def footer(self):
pass # No footer needed, header has page number
except ImportError:
FPDF_AVAILABLE = False
FPDF = None
PDF = None
def create_pdf_from_discord_messages(messages_data, server_name, channel_name, output_filename, font_path, logger=print):
"""
Creates a single PDF from a list of Discord message objects, formatted as a chat log.
UPDATED to include clickable links for attachments and embeds.
"""
if not FPDF_AVAILABLE:
logger("❌ PDF Creation failed: 'fpdf2' library is not installed.")
return False
if not messages_data:
logger(" No messages were found or fetched to create a PDF.")
return False
logger(" Sorting messages by date (oldest first)...")
messages_data.sort(key=lambda m: m.get('published', ''))
pdf = PDF(server_name, channel_name)
default_font_family = 'DejaVu'
try:
bold_font_path = font_path.replace("DejaVuSans.ttf", "DejaVuSans-Bold.ttf")
if not os.path.exists(font_path) or not os.path.exists(bold_font_path):
raise RuntimeError("Font files not found")
pdf.add_font('DejaVu', '', font_path, uni=True)
pdf.add_font('DejaVu', 'B', bold_font_path, uni=True)
except Exception as font_error:
logger(f" ⚠️ Could not load DejaVu font: {font_error}. Falling back to Arial.")
default_font_family = 'Arial'
pdf.default_font_family = 'Arial'
# --- Title Page ---
pdf.add_page()
pdf.set_font(default_font_family, 'B', 24)
pdf.cell(w=0, h=20, text="Discord Chat Log", align='C', new_x="LMARGIN", new_y="NEXT")
pdf.ln(10)
pdf.set_font(default_font_family, '', 16)
pdf.cell(w=0, h=10, text=f"Server: {server_name}", align='C', new_x="LMARGIN", new_y="NEXT")
pdf.cell(w=0, h=10, text=f"Channel: #{channel_name}", align='C', new_x="LMARGIN", new_y="NEXT")
pdf.ln(5)
pdf.set_font(default_font_family, '', 10)
pdf.cell(w=0, h=10, text=f"Generated on: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}", align='C', new_x="LMARGIN", new_y="NEXT")
pdf.cell(w=0, h=10, text=f"Total Messages: {len(messages_data)}", align='C', new_x="LMARGIN", new_y="NEXT")
pdf.add_page()
logger(f" Starting PDF creation with {len(messages_data)} messages...")
for i, message in enumerate(messages_data):
author = message.get('author', {}).get('global_name') or message.get('author', {}).get('username', 'Unknown User')
timestamp_str = message.get('published', '')
content = message.get('content', '')
attachments = message.get('attachments', [])
embeds = message.get('embeds', [])
try:
# Handle timezone information correctly
if timestamp_str.endswith('Z'):
timestamp_str = timestamp_str[:-1] + '+00:00'
dt_obj = datetime.datetime.fromisoformat(timestamp_str)
formatted_timestamp = dt_obj.strftime('%Y-%m-%d %H:%M:%S')
except (ValueError, TypeError):
formatted_timestamp = timestamp_str
# Draw a separator line
if i > 0:
pdf.ln(2)
pdf.set_draw_color(200, 200, 200) # Light grey line
pdf.cell(0, 0, '', border='T')
pdf.ln(2)
# Message Header
pdf.set_font(default_font_family, 'B', 11)
pdf.write(5, f"{author} ")
pdf.set_font(default_font_family, '', 9)
pdf.set_text_color(128, 128, 128)
pdf.write(5, f"({formatted_timestamp})")
pdf.set_text_color(0, 0, 0)
pdf.ln(6)
# Message Content
if content:
pdf.set_font(default_font_family, '', 10)
pdf.multi_cell(w=0, h=5, text=content)
# --- START: MODIFIED ATTACHMENT AND EMBED LOGIC ---
if attachments or embeds:
pdf.ln(1)
pdf.set_font(default_font_family, '', 9)
pdf.set_text_color(22, 119, 219) # A nice blue for links
for att in attachments:
file_name = att.get('name', 'untitled')
file_path = att.get('path', '')
# Construct the full, clickable URL for the attachment
full_url = f"https://kemono.cr/data{file_path}"
pdf.write(5, text=f"[Attachment: {file_name}]", link=full_url)
pdf.ln() # New line after each attachment
for embed in embeds:
embed_url = embed.get('url', 'no url')
# The embed URL is already a full URL
pdf.write(5, text=f"[Embed: {embed_url}]", link=embed_url)
pdf.ln() # New line after each embed
pdf.set_text_color(0, 0, 0) # Reset color to black
# --- END: MODIFIED ATTACHMENT AND EMBED LOGIC ---
try:
pdf.output(output_filename)
logger(f"✅ Successfully created Discord chat log PDF: '{os.path.basename(output_filename)}'")
return True
except Exception as e:
logger(f"❌ A critical error occurred while saving the final PDF: {e}")
return False

File diff suppressed because it is too large Load Diff

View File

@@ -141,12 +141,15 @@ def prepare_cookies_for_request(use_cookie_flag, cookie_text_input, selected_coo
def extract_post_info(url_string): def extract_post_info(url_string):
""" """
Parses a URL string to extract the service, user ID, and post ID. Parses a URL string to extract the service, user ID, and post ID.
UPDATED to support Discord server/channel URLs.
Args: Args:
url_string (str): The URL to parse. url_string (str): The URL to parse.
Returns: Returns:
tuple: A tuple containing (service, user_id, post_id). Any can be None. tuple: A tuple containing (service, id1, id2).
For posts: (service, user_id, post_id).
For Discord: ('discord', server_id, channel_id).
""" """
if not isinstance(url_string, str) or not url_string.strip(): if not isinstance(url_string, str) or not url_string.strip():
return None, None, None return None, None, None
@@ -155,7 +158,15 @@ def extract_post_info(url_string):
parsed_url = urlparse(url_string.strip()) parsed_url = urlparse(url_string.strip())
path_parts = [part for part in parsed_url.path.strip('/').split('/') if part] path_parts = [part for part in parsed_url.path.strip('/').split('/') if part]
# Standard format: /<service>/user/<user_id>/post/<post_id> # Check for new Discord URL format first
# e.g., /discord/server/891670433978531850/1252332668805189723
if len(path_parts) >= 3 and path_parts[0].lower() == 'discord' and path_parts[1].lower() == 'server':
service = 'discord'
server_id = path_parts[2]
channel_id = path_parts[3] if len(path_parts) >= 4 else None
return service, server_id, channel_id
# Standard creator/post format: /<service>/user/<user_id>/post/<post_id>
if len(path_parts) >= 3 and path_parts[1].lower() == 'user': if len(path_parts) >= 3 and path_parts[1].lower() == 'user':
service = path_parts[0] service = path_parts[0]
user_id = path_parts[2] user_id = path_parts[2]
@@ -174,7 +185,6 @@ def extract_post_info(url_string):
return None, None, None return None, None, None
def get_link_platform(url): def get_link_platform(url):
""" """
Identifies the platform of a given URL based on its domain. Identifies the platform of a given URL based on its domain.

View File

@@ -391,6 +391,10 @@ def setup_ui(main_app):
main_app.link_search_button.setVisible(False) main_app.link_search_button.setVisible(False)
main_app.link_search_button.setFixedWidth(int(30 * scale)) main_app.link_search_button.setFixedWidth(int(30 * scale))
log_title_layout.addWidget(main_app.link_search_button) log_title_layout.addWidget(main_app.link_search_button)
main_app.discord_scope_toggle_button = QPushButton("Scope: Files")
main_app.discord_scope_toggle_button.setVisible(False) # Hidden by default
main_app.discord_scope_toggle_button.setFixedWidth(int(140 * scale))
log_title_layout.addWidget(main_app.discord_scope_toggle_button)
main_app.manga_rename_toggle_button = QPushButton() main_app.manga_rename_toggle_button = QPushButton()
main_app.manga_rename_toggle_button.setVisible(False) main_app.manga_rename_toggle_button.setVisible(False)
main_app.manga_rename_toggle_button.setFixedWidth(int(140 * scale)) main_app.manga_rename_toggle_button.setFixedWidth(int(140 * scale))