adampweb
dec9083b43
Fix: Fixed trivial mistake with function call
2025-09-04 19:24:44 +00:00
Felipe
c517bd20d3
Actual retry implementation
...
seems I pushed an older revision of this apparently
2025-09-04 19:16:52 +00:00
Felipe
fc8d8a9441
Added retry command
...
fixes [Feature request} Retry flag
Fixes StrawberryMaster/wayback-machine-downloader#31
2025-08-20 01:21:29 +00:00
Felipe
fa306ac92b
Bumped version
2025-08-19 16:17:53 +00:00
Felipe
8c27aaebc9
Fix issue with index.html pages not loading
...
we were rejecting empty paths, causing these files to be skipped. How did I miss this?
2025-08-19 16:16:24 +00:00
Felipe
40e9c9bb51
Bumped version
2025-08-16 19:38:01 +00:00
Felipe
6bc08947b7
More aggressive sanitization
...
this should deal with some of the issues we've seen, luckily. What a ride!
2025-08-12 18:55:00 -03:00
Felipe
c731e0c7bd
Bumped version
2025-08-12 11:46:03 +00:00
Felipe
9fd2a7f8d1
Minor refactoring of HTML tag sanitization
2025-08-12 08:42:27 -03:00
Felipe
6ad312f31f
Sanitizing HTML tags
...
some sites contain tags *in* their URL, and fail to save on some devices like Windows
2025-08-05 23:44:34 +00:00
Felipe
62ea35daa6
Bumping version
2025-08-04 21:23:48 +00:00
Felipe
1f4202908f
Fixes for tidy_bytes
...
admittedly not the cleanest way to do this, although it works for #25 .
2025-07-31 12:58:22 -03:00
adampweb
801fb77f79
Perf: Refactored a huge function into smaller subprocesses
2025-07-29 21:12:20 +02:00
Felipe
bc868e6b39
Refactor tidy_bytes.rb
...
I'm not sure if we can easily determine the encoding behind each site (and I don't think Wayback Machine does that), *but* we can at least translate it and get it to download. This should be mostly useful for other, non-Western European languages. See #25
2025-07-29 10:10:56 -03:00
Felipe
2bf04aff48
Sanitize base_url and directory parameters
...
this might be the cause of #25 , at least from what it appears
2025-07-27 17:18:57 +00:00
Felipe
51becde916
Minor fix
2025-07-26 21:01:40 +00:00
Felipe
c30ee73977
Sanitize file_id
...
we were not consistently handling non-UTF-8 characters here, especially after commit e4487baafcab64d2b81a5fd7a6b572ac8fa772e2. This also fixes #25
2025-07-26 20:58:50 +00:00
Felipe
d3466b3387
Bumping version
...
normally I would've yanked the old gem, but that's not working here
2025-07-22 12:41:26 +00:00
Felipe
0663c1c122
Merge pull request #23 from adampweb/master
...
Fixed base image vulnerability
2025-07-21 14:44:43 -03:00
Felipe
bff10e7260
Initial implementation of a composite snapshot
...
see issue #22 . TBF
2025-07-21 15:30:49 +00:00
Felipe
3d181ce84c
Bumped version
2025-07-21 13:48:34 +00:00
Alfonso Corrado
999aa211ae
fix match filters
2025-07-21 13:42:44 +00:00
adampweb
e4487baafc
Fix: Handle default case in tidy_bytes
2025-07-20 17:13:36 +02:00
Felipe
fd329afdd2
Merge pull request #20 from underarchiver/rfc3968-url-validity-check
...
Prevent fetching off non RFC3968-compliant URLs
2025-07-11 10:55:12 -03:00
Felipe
038785557d
Ability to recursively download across subdomains
...
this is quite experimental. Fixes #15 but still needs more testing
2025-07-09 12:53:58 +00:00
Felipe
2eead8cc27
Bumping version
2025-06-27 19:50:39 +00:00
cybercode3
7e5cdd54fb
Fix: path sanitizer and timestamp sorting errors
...
Fix: path sanitizer and timestamp sorting errors
( I encountered these errors issues with the script using Windows 11. Changing these two lines got the script to work for me. )
- Fixed a bug in Windows path sanitizer where String#gsub was incorrectly called with a Proc as the replacement. Replaced with block form to ensure proper character escaping for Windows-incompatible file path characters.
- Fixed an ArgumentError in file sorting when a file snapshot’s timestamp was nil. Updated sort logic to safely handle nil timestamps by converting them to strings or integers, preventing comparison errors between NilClass and String/Integer.
These changes prevent fatal runtime errors when downloading files with certain URLs or incomplete metadata, improving robustness for sites with inconsistent archive data.
2025-06-25 02:07:20 +00:00
Felipe
4160ff5e4a
Bumping version
2025-06-18 18:05:31 +00:00
underarchiver
f03d92a3c4
Prevent fetching off non RFC3968-compliant URLs
2025-06-17 13:27:10 +02:00
Eli Dickinson
c3c5b8446a
don’t append /* when —exact-url
2025-06-15 13:26:11 -04:00
Felipe
18357a77ed
Correct file path and sanitization in Windows
...
Not only we weren't normalizing the file directories, we were also agressively sanitizing incorrect characters, leading to some funny stuff on Windows. Fixes #16
2025-06-15 13:48:11 +00:00
Felipe
3fdfd70fc1
Bump version
2025-06-05 22:34:40 +00:00
Eli Dickinson
79cbb639e7
Fix bug with archive urls containing square brackets
2025-06-03 16:36:03 -04:00
Eli Dickinson
1681a12579
workaround for API only showing html files for some domains
...
See https://github.com/StrawberryMaster/wayback-machine-downloader/issues/6
2025-05-30 12:50:48 -04:00
Felipe
f38756dd76
Correction for downloaded data folder
...
if you downloaded content from example.org/*, it would be listed in a folder titled * instead of the sitename. See #6 (and thanks to elidickinson for pointing it out!)
2025-05-30 14:00:32 +00:00
Felipe
9452411e32
Added nil checks
2025-05-30 13:52:25 +00:00
Felipe
61e22cfe25
Bump versions
2025-05-27 18:10:09 +00:00
Felipe
183ed61104
Attempt at fixing --all
...
I honestly don't recall if this was implemented in the original code, and I'm guessing this worked at *some point* during this fork. It seems to work correctly now, however. See #6 and #11
2025-05-27 17:17:34 +00:00
Felipe
ab4324c0eb
Bumping to 2.3.6
2025-05-18 16:49:44 +00:00
Felipe
e28d7d578b
Experimental ability to rewrite URLs to local browsing
2025-05-18 16:48:50 +00:00
adampweb
1ef8c14c48
Removed unused variable from if condition
2025-05-11 10:57:36 +02:00
Felipe
917f4f8798
Bumping version
2025-04-30 13:05:30 +00:00
Felipe
4db13a7792
Fix --all-timestamps
...
we were accidentally removing the timestamp prefix from `file_id`, rendering that option useless in 2.3.4. This should again now. This will fix #4
2025-04-30 13:01:29 +00:00
Felipe
31d51728af
Bump version
2025-04-19 14:07:05 +00:00
Felipe
febffe5de4
Added support for resuming incomplete downloads
2025-04-19 13:40:14 +00:00
Felipe
27dd619aa4
gzip support
2025-04-19 13:07:07 +00:00
Felipe
0c701ee890
Fetching API calls sequentially
...
although the WM API is particularly wonky and this will not prevent all errors, this aligns better with what we have here.
2025-03-29 22:27:01 +00:00
Felipe
2243958643
Fixes in cases of too many redirects or files not found
2025-02-09 16:48:52 +00:00
Felipe
46450d7c20
Refactoring tidy_bytes, part 2
2025-02-09 16:47:29 +00:00
Felipe
019534794c
Taking care of empty responses
...
fixes "unexpected token at ''" appearing after fetching a list of snapshots
2025-02-09 16:24:02 +00:00