Felipe
b2fc748c2c
another page_requisites fix
2025-12-10 12:16:26 +00:00
Felipe
8632050c45
page requisites fix
2025-12-10 12:13:39 +00:00
Felipe
2aa694eed0
Initial implementation of --page-requisites
...
see StrawberryMaster/wayback-machine-downloader#39
2025-12-10 11:59:00 +00:00
Felipe
4d2513eca8
Be a bit more tolerant of timeouts here
2025-11-15 12:59:07 +00:00
Felipe
67685b781e
Improve handling for wildcard URLs
...
fixes #38
2025-11-15 12:45:34 +00:00
Felipe
f7c0f1a964
Better support for .php, .asp, and other files when using --local
...
see #37
2025-11-04 23:18:04 +00:00
Nicolai Weitkemper
99da3ca48e
Fix Docker command volume mount path in README ( #35 )
2025-10-28 15:30:19 -03:00
Felipe
34f22c128c
Bump to 2.4.4
v2.4.4
2025-10-27 16:51:58 +00:00
Felipe
71bdc7c2de
Use explicit current directory to avoid ambiguity
...
see `Results saved in /build/websites` but nothing is saved :(
Fixes StrawberryMaster/wayback-machine-downloader#34
2025-10-27 16:48:15 +00:00
Felipe
4b1ec1e1cc
Added troubleshooting section
...
includes a workaround fix for SSL CRL error
Fixes StrawberryMaster/wayback-machine-downloader#33
2025-10-08 11:33:50 +00:00
Felipe
d7a63361e3
Use a FixedThreadPool for concurrent API calls
2025-09-24 21:05:22 +00:00
Felipe
b1974a8dfa
Refactor ConnectionPool to use SizedQueue for connection management and improve cleanup logic
2025-09-24 20:50:10 +00:00
Huw Fulcher
012b295aed
Corrected wrong flag in example ( #32 )
...
Example 2 in Performance section incorrectly stated to use `--snapshot-pages` whereas the parameter is actually `--maximum-snapshot`
2025-09-10 08:06:57 -03:00
adampweb
dec9083b43
Fix: Fixed trivial mistake with function call
2025-09-04 19:24:44 +00:00
Felipe
c517bd20d3
Actual retry implementation
...
seems I pushed an older revision of this apparently
2025-09-04 19:16:52 +00:00
Felipe
fc8d8a9441
Added retry command
...
fixes [Feature request} Retry flag
Fixes StrawberryMaster/wayback-machine-downloader#31
2025-08-20 01:21:29 +00:00
Felipe
fa306ac92b
Bumped version
v2.4.3
2025-08-19 16:17:53 +00:00
Felipe
8c27aaebc9
Fix issue with index.html pages not loading
...
we were rejecting empty paths, causing these files to be skipped. How did I miss this?
2025-08-19 16:16:24 +00:00
Felipe
40e9c9bb51
Bumped version
v2.4.2
2025-08-16 19:38:01 +00:00
Felipe
6bc08947b7
More aggressive sanitization
...
this should deal with some of the issues we've seen, luckily. What a ride!
2025-08-12 18:55:00 -03:00
Felipe
c731e0c7bd
Bumped version
v2.4.1
2025-08-12 11:46:03 +00:00
Felipe
9fd2a7f8d1
Minor refactoring of HTML tag sanitization
2025-08-12 08:42:27 -03:00
Felipe
6ad312f31f
Sanitizing HTML tags
...
some sites contain tags *in* their URL, and fail to save on some devices like Windows
2025-08-05 23:44:34 +00:00
Felipe
62ea35daa6
Bumping version
v2.4.0
2025-08-04 21:23:48 +00:00
Felipe
1f4202908f
Fixes for tidy_bytes
...
admittedly not the cleanest way to do this, although it works for #25 .
2025-07-31 12:58:22 -03:00
Felipe
bed3f6101c
Added missing gemspec file
2025-07-31 12:57:03 -03:00
Felipe
754df6b8d6
Merge pull request #27 from adampweb/master
...
Refactored huge functions & cleanup
2025-07-29 18:09:51 -03:00
adampweb
801fb77f79
Perf: Refactored a huge function into smaller subprocesses
2025-07-29 21:12:20 +02:00
adampweb
e9849e6c9c
Cleanup: I removed the obsolete options.
...
The classic way provides more flexibility
2025-07-29 20:55:10 +02:00
Felipe
bc868e6b39
Refactor tidy_bytes.rb
...
I'm not sure if we can easily determine the encoding behind each site (and I don't think Wayback Machine does that), *but* we can at least translate it and get it to download. This should be mostly useful for other, non-Western European languages. See #25
2025-07-29 10:10:56 -03:00
Felipe
2bf04aff48
Sanitize base_url and directory parameters
...
this might be the cause of #25 , at least from what it appears
2025-07-27 17:18:57 +00:00
Felipe
51becde916
Minor fix
2025-07-26 21:01:40 +00:00
Felipe
c30ee73977
Sanitize file_id
...
we were not consistently handling non-UTF-8 characters here, especially after commit e4487baafcab64d2b81a5fd7a6b572ac8fa772e2. This also fixes #25
2025-07-26 20:58:50 +00:00
Felipe
d3466b3387
Bumping version
...
normally I would've yanked the old gem, but that's not working here
v2.3.12
2025-07-22 12:41:26 +00:00
Felipe
0250579f0e
Added missing file
2025-07-22 12:38:12 +00:00
Felipe
0663c1c122
Merge pull request #23 from adampweb/master
...
Fixed base image vulnerability
2025-07-21 14:44:43 -03:00
adampweb
93115f70ec
Merge pull request #5 from adampweb/snyk-fix-88576ceadf7e0c41b63a2af504a3c8ae
...
[Snyk] Security upgrade ruby from 3.4.4-alpine to 3.4.5-alpine
2025-07-21 18:46:03 +02:00
snyk-bot
3d37ae10fd
fix: Dockerfile to reduce vulnerabilities
...
The following vulnerabilities are fixed with an upgrade:
- https://snyk.io/vuln/SNYK-ALPINE322-OPENSSL-10597997
- https://snyk.io/vuln/SNYK-ALPINE322-OPENSSL-10597997
2025-07-21 16:45:10 +00:00
Felipe
bff10e7260
Initial implementation of a composite snapshot
...
see issue #22 . TBF
2025-07-21 15:30:49 +00:00
Felipe
3d181ce84c
Bumped version
v2.3.11
2025-07-21 13:48:34 +00:00
Alfonso Corrado
999aa211ae
fix match filters
2025-07-21 13:42:44 +00:00
adampweb
ffdce7e4ec
Exclude dev enviroment config
2025-07-20 17:14:09 +02:00
adampweb
e4487baafc
Fix: Handle default case in tidy_bytes
2025-07-20 17:13:36 +02:00
Felipe
82ff2de3dc
Added brief note for users with both WMD gems here
2025-07-14 08:12:38 -03:00
Felipe
fd329afdd2
Merge pull request #20 from underarchiver/rfc3968-url-validity-check
...
Prevent fetching off non RFC3968-compliant URLs
2025-07-11 10:55:12 -03:00
Felipe
038785557d
Ability to recursively download across subdomains
...
this is quite experimental. Fixes #15 but still needs more testing
2025-07-09 12:53:58 +00:00
Felipe
2eead8cc27
Bumping version
v2.3.10
2025-06-27 19:50:39 +00:00
cybercode3
7e5cdd54fb
Fix: path sanitizer and timestamp sorting errors
...
Fix: path sanitizer and timestamp sorting errors
( I encountered these errors issues with the script using Windows 11. Changing these two lines got the script to work for me. )
- Fixed a bug in Windows path sanitizer where String#gsub was incorrectly called with a Proc as the replacement. Replaced with block form to ensure proper character escaping for Windows-incompatible file path characters.
- Fixed an ArgumentError in file sorting when a file snapshot’s timestamp was nil. Updated sort logic to safely handle nil timestamps by converting them to strings or integers, preventing comparison errors between NilClass and String/Integer.
These changes prevent fatal runtime errors when downloading files with certain URLs or incomplete metadata, improving robustness for sites with inconsistent archive data.
2025-06-25 02:07:20 +00:00
Felipe
4160ff5e4a
Bumping version
v2.3.9
2025-06-18 18:05:31 +00:00
underarchiver
f03d92a3c4
Prevent fetching off non RFC3968-compliant URLs
2025-06-17 13:27:10 +02:00