31 Commits

Author SHA1 Message Date
Felipe
1f4202908f
Fixes for tidy_bytes
admittedly not the cleanest way to do this, although it works for #25.
2025-07-31 12:58:22 -03:00
adampweb
801fb77f79 Perf: Refactored a huge function into smaller subprocesses 2025-07-29 21:12:20 +02:00
Felipe
bc868e6b39
Refactor tidy_bytes.rb
I'm not sure if we can easily determine the encoding behind each site (and I don't think Wayback Machine does that), *but* we can at least translate it and get it to download. This should be mostly useful for other, non-Western European languages. See #25
2025-07-29 10:10:56 -03:00
Felipe
2bf04aff48 Sanitize base_url and directory parameters
this might be the cause of #25, at least from what it appears
2025-07-27 17:18:57 +00:00
adampweb
e4487baafc Fix: Handle default case in tidy_bytes 2025-07-20 17:13:36 +02:00
Felipe
038785557d Ability to recursively download across subdomains
this is quite experimental. Fixes #15 but still needs more testing
2025-07-09 12:53:58 +00:00
Eli Dickinson
c3c5b8446a don’t append /* when —exact-url 2025-06-15 13:26:11 -04:00
Eli Dickinson
1681a12579 workaround for API only showing html files for some domains
See https://github.com/StrawberryMaster/wayback-machine-downloader/issues/6
2025-05-30 12:50:48 -04:00
Felipe
febffe5de4
Added support for resuming incomplete downloads 2025-04-19 13:40:14 +00:00
Felipe
46450d7c20
Refactoring tidy_bytes, part 2 2025-02-09 16:47:29 +00:00
Felipe
019534794c
Taking care of empty responses
fixes "unexpected token at ''" appearing after fetching a list of snapshots
2025-02-09 16:24:02 +00:00
Felipe
fdcb81f1a0 Refactoring 2024-12-31 16:50:50 +00:00
Felipe
9bbb67cd90 More testing 2024-12-31 00:11:58 +00:00
Felipe
466228fee4 Refactoring the archive API 2024-06-26 16:53:08 +00:00
hartator
30475c5c9e Make URI#open cross Ruby versions compatible 2021-06-06 19:47:11 -05:00
Paul Wise
ea15965d6d
Fix typos
Suggested-by: codespell, spellintian
2021-05-03 20:20:09 +08:00
Paul Wise
cd29f79fd0
Switch to the JSON output format for easier parsing 2021-05-03 17:44:56 +08:00
Paul Wise
afab72c894
Construct the cdx API query using a URI object
This avoids problems related to URL encoding.

Obsoletes: https://github.com/hartator/wayback-machine-downloader/pull/116
2021-05-03 17:44:36 +08:00
DessertArbiter
15edae6a92 updated deprecated calls, changed URI to https 2020-05-27 20:28:06 -04:00
Oleg Pudeyev
aab9a49509 Get rid of assigned but unused variable warnings under ruby 2.4 2017-06-03 17:00:50 -04:00
Oleg Pudeyev
e6157c21b9 Parens are required before * when used for splatting.
https://stackoverflow.com/questions/41821628/ruby-how-can-i-kill-warning-interpreted-as-argument-prefix
2017-06-03 16:59:08 -04:00
Oleg Pudeyev
6779971dc9 Fix whitespace 2017-03-15 17:08:40 -04:00
hartator
8d5be7a89e Fix compatibility with Ruby 1.9.x and proxies 2016-11-14 18:18:58 -06:00
Anton Eliasson
54bd5d3852 Support http(s)_proxy ENV variables
Closes issue #65
2016-10-31 17:49:30 +01:00
hartator
7eedc1a183 Get snapshot result page per page index 2016-09-24 10:04:57 -07:00
hartator
87eee70969 Show 404 archives when a resource had 200 response previously 2016-09-18 12:24:29 -05:00
hartator
21dd22f581 Disable gzip compression on API calls 2016-09-17 14:42:32 -05:00
hartator
2e7f8611ef Load early net/http library for Ruby 1.9.x 2016-09-17 14:08:29 -05:00
hartator
95eaa91715 Refactor archive API calls to own module 2016-09-17 13:37:17 -05:00
hartator
205e0da48b Add to_regex library to treat complex regex cases 2015-11-19 15:28:27 -06:00
hartator
4c712d4614 Move TidyBytes to fix executable issue #3 2015-08-19 12:02:08 -05:00