182 Commits

Author SHA1 Message Date
Felipe
9fd2a7f8d1
Minor refactoring of HTML tag sanitization 2025-08-12 08:42:27 -03:00
Felipe
6ad312f31f Sanitizing HTML tags
some sites contain tags *in* their URL, and fail to save on some devices like Windows
2025-08-05 23:44:34 +00:00
Felipe
62ea35daa6 Bumping version 2025-08-04 21:23:48 +00:00
adampweb
801fb77f79 Perf: Refactored a huge function into smaller subprocesses 2025-07-29 21:12:20 +02:00
Felipe
2bf04aff48 Sanitize base_url and directory parameters
this might be the cause of #25, at least from what it appears
2025-07-27 17:18:57 +00:00
Felipe
51becde916 Minor fix 2025-07-26 21:01:40 +00:00
Felipe
c30ee73977 Sanitize file_id
we were not consistently handling non-UTF-8 characters here, especially after commit e4487baafcab64d2b81a5fd7a6b572ac8fa772e2. This also fixes #25
2025-07-26 20:58:50 +00:00
Felipe
d3466b3387 Bumping version
normally I would've yanked the old gem, but that's not working here
2025-07-22 12:41:26 +00:00
Felipe
bff10e7260 Initial implementation of a composite snapshot
see issue #22. TBF
2025-07-21 15:30:49 +00:00
Felipe
3d181ce84c Bumped version 2025-07-21 13:48:34 +00:00
Alfonso Corrado
999aa211ae fix match filters 2025-07-21 13:42:44 +00:00
Felipe
fd329afdd2
Merge pull request #20 from underarchiver/rfc3968-url-validity-check
Prevent fetching off non RFC3968-compliant URLs
2025-07-11 10:55:12 -03:00
Felipe
038785557d Ability to recursively download across subdomains
this is quite experimental. Fixes #15 but still needs more testing
2025-07-09 12:53:58 +00:00
Felipe
2eead8cc27 Bumping version 2025-06-27 19:50:39 +00:00
cybercode3
7e5cdd54fb Fix: path sanitizer and timestamp sorting errors
Fix: path sanitizer and timestamp sorting errors

( I encountered these errors issues with the script using Windows 11. Changing these two lines got the script to work for me. )

- Fixed a bug in Windows path sanitizer where String#gsub was incorrectly called with a Proc as the replacement. Replaced with block form to ensure proper character escaping for Windows-incompatible file path characters.
- Fixed an ArgumentError in file sorting when a file snapshot’s timestamp was nil. Updated sort logic to safely handle nil timestamps by converting them to strings or integers, preventing comparison errors between NilClass and String/Integer.

These changes prevent fatal runtime errors when downloading files with certain URLs or incomplete metadata, improving robustness for sites with inconsistent archive data.
2025-06-25 02:07:20 +00:00
Felipe
4160ff5e4a Bumping version 2025-06-18 18:05:31 +00:00
underarchiver
f03d92a3c4 Prevent fetching off non RFC3968-compliant URLs 2025-06-17 13:27:10 +02:00
Felipe
18357a77ed Correct file path and sanitization in Windows
Not only we weren't normalizing the file directories, we were also agressively sanitizing incorrect characters, leading to some funny stuff on Windows. Fixes #16
2025-06-15 13:48:11 +00:00
Felipe
3fdfd70fc1 Bump version 2025-06-05 22:34:40 +00:00
Eli Dickinson
79cbb639e7 Fix bug with archive urls containing square brackets 2025-06-03 16:36:03 -04:00
Felipe
f38756dd76 Correction for downloaded data folder
if you downloaded content from example.org/*, it would be listed in a folder titled * instead of the sitename. See #6 (and thanks to elidickinson for pointing it out!)
2025-05-30 14:00:32 +00:00
Felipe
9452411e32 Added nil checks 2025-05-30 13:52:25 +00:00
Felipe
61e22cfe25
Bump versions 2025-05-27 18:10:09 +00:00
Felipe
183ed61104
Attempt at fixing --all
I honestly don't recall if this was implemented in the original code, and I'm guessing this worked at *some point* during this fork. It seems to work correctly now, however. See #6 and #11
2025-05-27 17:17:34 +00:00
Felipe
ab4324c0eb
Bumping to 2.3.6 2025-05-18 16:49:44 +00:00
Felipe
e28d7d578b
Experimental ability to rewrite URLs to local browsing 2025-05-18 16:48:50 +00:00
adampweb
1ef8c14c48 Removed unused variable from if condition 2025-05-11 10:57:36 +02:00
Felipe
917f4f8798
Bumping version 2025-04-30 13:05:30 +00:00
Felipe
4db13a7792
Fix --all-timestamps
we were accidentally removing the timestamp prefix from `file_id`, rendering that option useless in 2.3.4. This should again now. This will fix #4
2025-04-30 13:01:29 +00:00
Felipe
31d51728af
Bump version 2025-04-19 14:07:05 +00:00
Felipe
febffe5de4
Added support for resuming incomplete downloads 2025-04-19 13:40:14 +00:00
Felipe
27dd619aa4
gzip support 2025-04-19 13:07:07 +00:00
Felipe
0c701ee890
Fetching API calls sequentially
although the WM API is particularly wonky and this will not prevent all errors, this aligns better with what we have here.
2025-03-29 22:27:01 +00:00
Felipe
2243958643
Fixes in cases of too many redirects or files not found 2025-02-09 16:48:52 +00:00
Felipe
9283f04a57 Added ability to download rewritten Wayback Archive files 2025-01-02 12:17:20 +00:00
Felipe
b38d528656 typo fix 2025-01-01 12:20:06 +00:00
Felipe
4d5f187f15 Proper connection pool lifecycle management 2024-12-31 16:48:29 +00:00
Felipe
7de1c5a028 typo fix 2024-12-31 15:03:28 +00:00
Felipe
75617060d7 Workflow fixes, pt.3
You've gotta be squidding me. How did I never notice this
2024-12-05 12:11:16 +00:00
Felipe
02785b2eba Workflow fixes, pt. 1 2024-12-05 12:00:44 +00:00
Felipe
d1b70d83b1 Minor cleanup 2024-12-05 11:53:38 +00:00
Felipe
45fa2be573 Significant refactoring
including extra config settings, a proper rate limit, and a logger. Fixes: #307 #291 #281 #269 and probably others too
2024-12-03 00:23:47 +00:00
Felipe
a3ac4e0341 Minor cleanup 2024-06-26 20:30:59 +00:00
Felipe
93a6fb3c1b typo 2024-06-26 19:52:34 +00:00
Felipe
509d7034a1 Setting file modified time to value reported by Wayback Machine
Implements 937306712c564e5757d898feacc14fbabd10722d, fixes Maintain original creation/modified dates of files while downloading #174
2024-06-26 19:52:12 +00:00
Felipe
0a7752eb41 Minor cleanup 2024-06-26 19:47:19 +00:00
Felipe
cff30f529e Using net:HTTP and decompressing gzip content
see https://github.com/ShiftaDeband/wayback-machine-downloader and bf6e33c2fe
2024-06-26 16:54:55 +00:00
hartator
cf770c2e55 Bump gem version 2021-09-04 01:51:08 -05:00
Paul Wise
9da87bfa74
Make URI#open cross Ruby versions compatible
Inspired-by: commit 30475c5c9e1d92d63b75dc5f22a40dd16c1aa23a
2021-06-08 07:59:38 +08:00
hartator
83b4f880b1 Bump Gem version 2021-06-06 19:47:48 -05:00