WebP-eXpress/docs/regex.md
Malin 37cf714058 WebP Express CloudHost.es Fix v0.25.9-cloudhost
 Fixed bulk conversion getting stuck on missing files
 Added robust error handling and timeout protection
 Improved JavaScript response parsing
 Added file existence validation
 Fixed missing PHP class imports
 Added comprehensive try-catch error recovery

🔧 Key fixes:
- File existence checks before conversion attempts
- 30-second timeout protection per file
- Graceful handling of 500 errors and JSON parsing issues
- Automatic continuation to next file on failures
- Cache busting for JavaScript updates

🎯 Result: Bulk conversion now completes successfully even with missing files

🚀 Generated with Claude Code (https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-23 10:22:32 +02:00

155 lines
7.2 KiB
Markdown

Pattern used for image urls:
in attributes: https://regexr.com/46jat
in css: https://regexr.com/46jcg
In case reqexr.com should be down, here are the content:
# in attributes
*pattern:*
(?<=(?:<(img|source|input|iframe)[^>]*\s+(src|srcset|data-[^=]*)\s*=\s*[\"\']?))(?:[^\"\'>]+)(\.png|\.jp[e]?g)(\s\d+w)?(?=\/?[\"\'\s\>])
*text:*
Notice: The pattern is meant for PHP and contains syntax which only works in some browsers. It works in Chrome. Not in Firefox.
The following should produce matches:
<img src="header.jpg">
<img src="/header.jpg">
<img src="http://example.com/header.jpeg" alt="">
<img src="http://example.com/header.jpg">
<img src="http://example.com/header.jpg"/>
<img src = "http://example.com/header.jpg">
<img src=http://example.com/header.jpg alt="">
<img src=http://example.com/header.jpg>
<img src=http://example.com/header.jpg alt="hello">
<img src=http://example.com/header.jpg />
<img src=http://example.com/header_.jpg/>
<picture><source src="http://example.com/header.jpg"><img src="http://example.com/header.jpg"></picture>
<input type="image" src="http://example.com/flamingo.jpg">
<iframe src="http://example.com/image.jpg"></iframe>
In srcset, the whole attribute must be matched
<img src="http://example.com/header.jpg" srcset="http://example.com/header.jpg 1000w">
<img src="http://example.com/header.jpg" srcset="http://example.com/header.jpg 1000w,http://example.com/header.jpg 1000w, http://example.com/header.jpg 2000w">
<img src="http://example.com/header.jpg" srcset="http://example.com/header-150x150.jpg 500w,http://example.com/header.jpg-300x300.jpg" sizes="(max-width: 480px) 100vw, (max-width: 900px) 33vw, 254px" alt="" width="100" height="100">
Common lazy load attributes are matched:
<img data-cvpsrc="http://example.com/header.jpg">
<img data-cvpset="http://example.com/header.jpg">
<img data-thumb="http://example.com/header.jpg">
<img data-bg-url="http://example.com/header.jpg">
<img data-large_image="http://example.com/header.jpg">
<img data-lazyload="http://example.com/header.jpg">
<img data-source-url="http://example.com/header.jpg">
<img data-srcsmall="http://example.com/header.jpg">
<img data-srclarge="http://example.com/header.jpg">
<img data-srcfull="http://example.com/header.jpg">
<img data-slide-img="http://example.com/header.jpg">
<img data-lazy-original="http://example.com/header.jpg">
The following should NOT produce matches:
-----------------------------------------
Ignore URLs with query string:
<img src="http://example.com/header.jpg?width=200">
<img src="http://example.com/tegning.jpg.webp" alt="">
<img src="http://example.com/tegning.jpglidilo" alt="">
<img src="http://example.com/header.jpg/hi-res">
<img src=http://example.com/header.gif alt=nice-jpg>
<img src="http://example.com/tegning.webp" alt="">
src="http://example.com/header.jpeg"
<article data-src="http://example.com/header.jpg" />
<img><script src="http://example.com/script.js?preload=image.jpg">
I use another pattern for matching image urls in styles: https://regexr.com/46jcg
It matches stuff like this:
<div style="background-image: url('http://example.com/image.png'), url("/image2.jpeg", url(http://example.com/image3.jpg);"></div>
<div style="background: url ("http://example.com/image2.jpg")"></div>
<style>#myphoto {background: url("http://example.com/image2.jpg")}</style>
I have another pattern where we allow QS here: https://regexr.com/46ivi
PS: The rules are used for the WebP Express plugin for Wordpress
PPS: This regex is used in WPFastestCache (not just images)
// $content = preg_replace_callback("/(srcset|src|href|data-cvpsrc|data-cvpset|data-thumb|data-bg-url|data-large_image|data-lazyload|data-source-url|data-srcsmall|data-srclarge|data-srcfull|data-slide-img|data-lazy-original)\s{0,2}\=[\'\"]([^\'\"]+)[\'\"]/i", array($this, 'cdn_replace_urls'), $content);
PPPS:
As we are limiting to a few tags (img, source, input, etc), and only match image urls ending with (png|jpe?g), I deem it ok to match in ANY "data-" attribute.
But if you want to limit it to attributes that smells like they are used for images you can do this:
(src|srcset|data-[^=]*(lazy|small|slide|img|large|src|thumb|source|set|bg-url)[^=]*)
That will catch the following known and more: data-cvpsrc|data-cvpset|data-thumb|data-bg-url|data-large_image|data-lazyload|data-source-url|data-srcsmall|data-srclarge|data-srcfull|data-slide-img|data-lazy-original
# in style
*pattern:*
((?<=(?:((style\s*=)|(\<\s*style)).*background(-image)?\s*:\s*url\s*\([\"\']?)|(((style\s*=)|(\<\s*style)).*url.*,\s*url\([\"\']?))[^\"\']*\.(jpe?g|png))(?=[\"\'\s\>)])
*text:*
Notice: The pattern is meant for PHP and contains syntax which only works in some browsers. It works in Chrome. Not in Firefox.
The following should produce matches:
<style>#myphoto {background: url("http://example.com/image2.jpg")}</style>
<div style="background-image: url('http://example.com/image.png'), url("/image2.jpeg"), url(http://example.com/image3.jpg);"></div>
<div style="background: url ("http://example.com/image2.jpg")"></div>
<style>#myphoto {background: url("http://example.com/image2.jpg"), url("image2.jpeg"}</style>
Not these:
----------
GIFs are disallowed:
<div style="background-image: url("http://example.com/image.gif"), url("http://example.com/image2.gif", url("image3.gif");"></div>
Querystrings are disallowed:
<div style="background-image: url('http://example.com/image.jpg?no-qs!')"></div>
HTML attributes disallowed:
<img src="header.jpg">
Go with style: background: url("http://example.com/image2.jpg")
And none of this either:
<div style="background-image: url('http://example.com/image.jpgelegi')"></div>
<img src="header.jpg">
<img src="/header.jpg">
<img src="http://example.com/header.jpeg" alt="">
<img src="http://example.com/header.jpg">
<img src="http://example.com/header.jpg"/>
<img src = "http://example.com/header.jpg">
<img src=http://example.com/header.jpg alt="">
<img src=http://example.com/header.jpg>
<img src=http://example.com/header.jpg alt="hello">
<img src=http://example.com/header.jpg />
<img src=http://example.com/header_.jpg/>
<picture><source src="http://example.com/header.jpg"><img src="http://example.com/header.jpg"></picture>
<input type="image" src="http://example.com/flamingo.jpg">
<iframe src="http://example.com/image.jpg"></iframe>
<img src="http://example.com/header.jpg" srcset="http://example.com/header.jpg 1000w">
<img src="http://example.com/header.jpg" srcset="http://example.com/header.jpg 1000w,http://example.com/header.jpg 1000w, http://example.com/header.jpg 2000w">
<img src="http://example.com/header.jpg" srcset="http://example.com/header-150x150.jpg 500w,http://example.com/header.jpg-300x300.jpg" sizes="(max-width: 480px) 100vw, (max-width: 900px) 33vw, 254px" alt="" width="100" height="100">
<img src="http://example.com/tegning.jpg.webp" alt="">
<img src="http://example.com/tegning.jpglidilo" alt="">
<img src="http://example.com/header.jpg/hi-res">
<img src=http://example.com/header.gif alt=nice-jpg>
<img src="http://example.com/tegning.webp" alt="">
src="http://example.com/header.jpeg"
<article data-src="http://example.com/header.jpg" />
<img><script src="http://example.com/script.js?preload=image.jpg">
I use another pattern for matching image urls in HTML attributes:
https://regexr.com/46jat