WebP Express CloudHost.es Fix v0.25.9-cloudhost

 Fixed bulk conversion getting stuck on missing files
 Added robust error handling and timeout protection
 Improved JavaScript response parsing
 Added file existence validation
 Fixed missing PHP class imports
 Added comprehensive try-catch error recovery

🔧 Key fixes:
- File existence checks before conversion attempts
- 30-second timeout protection per file
- Graceful handling of 500 errors and JSON parsing issues
- Automatic continuation to next file on failures
- Cache busting for JavaScript updates

🎯 Result: Bulk conversion now completes successfully even with missing files

🚀 Generated with Claude Code (https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-09-23 10:22:32 +02:00
commit 37cf714058
553 changed files with 55249 additions and 0 deletions

View File

@@ -0,0 +1,19 @@
<?php
$finder = PhpCsFixer\Finder::create()
->exclude('tests')
->in(__DIR__)
;
$config = PhpCsFixer\Config::create();
$config
->setRules([
'@PSR2' => true,
'array_syntax' => [
'syntax' => 'short',
],
])
->setFinder($finder)
;
return $config;

View File

@@ -0,0 +1,182 @@
# dom-util-for-webp
[![Latest Stable Version](https://img.shields.io/packagist/v/rosell-dk/dom-util-for-webp.svg?style=flat-square)](https://packagist.org/packages/rosell-dk/dom-util-for-webp)
[![Minimum PHP Version](https://img.shields.io/badge/php-%3E%3D%205.6-8892BF.svg?style=flat-square)](https://php.net)
[![Build Status](https://img.shields.io/github/actions/workflow/status/rosell-dk/dom-util-for-webp/ci.yml?branch=master&logo=GitHub&style=flat-square&label=build)](https://github.com/rosell-dk/dom-util-for-webp/actions/workflows/ci.yml)
[![Coverage](https://img.shields.io/endpoint?url=https://little-b.it/dom-util-for-webp/code-coverage/coverage-badge.json)](http://little-b.it/dom-util-for-webp/code-coverage/coverage/index.html)
[![Software License](https://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat-square)](https://github.com/rosell-dk/dom-util-for-webp/blob/master/LICENSE)
*Replace image URLs found in HTML*
This library can do two things:
1) Replace image URLs in HTML
2) Replace *&lt;img&gt;* tags with *&lt;picture&gt;* tags, adding webp versions to sources
To setup with composer, run ```composer require rosell-dk/dom-util-for-webp```.
## 1. Replacing image URLs in HTML
The *ImageUrlReplacer::replace($html)* method accepts a piece of HTML and returns HTML where where all image URLs have been replaced - even those in inline styles.
*Usage:*
```php
$modifiedHtml = ImageUrlReplacer::replace($html);
```
### Example replacements:
*input:*
```html
<img src="image.jpg">
<img src="1.jpg" srcset="2.jpg 1000w">
<picture>
<source srcset="1.jpg" type="image/webp">
<source srcset="2.png" type="image/webp">
<source src="3.gif"> <!-- gifs are skipped in default behaviour -->
<source src="4.jpg?width=200"> <!-- urls with query string are skipped in default behaviour -->
</picture>
<div style="background-image: url('image.jpeg')"></div>
<style>
#hero {
background: lightblue url("image.png") no-repeat fixed center;;
}
</style>
<input type="button" src="1.jpg">
<img data-src="image.jpg"> <!-- any attribute starting with "data-" are replaced (if it ends with "jpg", "jpeg" or "png"). For lazy-loading -->
```
*output:*
```html
<img src="image.jpg.webp">
<img src="1.jpg.webp" srcset="2.jpg.webp 1000w">
<picture>
<source srcset="1.jpg.webp" type="image/webp">
<source srcset="2.jpg.webp" type="image/webp">
<source srcset="3.gif"> <!-- gifs are skipped in default behaviour -->
<source srcset="4.jpg?width=200"> <!-- urls with query string are skipped in default behaviour -->
</picture>
<div style="background-image: url('image.jpeg.webp')"></div>
<style>
#hero {
background: lightblue url("image.png.webp") no-repeat fixed center;;
}
</style>
<input type="button" src="1.jpg.webp">
<img data-src="image.jpg.webp"> <!-- any attribute starting with "data-" are replaced (if it ends with "jpg", "jpeg" or "png"). For lazy-loading -->
```
Default behaviour of *ImageUrlReplacer::replace*:
- The modified URL is the same as the original, with ".webp" appended (to change, override the `replaceUrl` function)
- Only replaces URLs that ends with "png", "jpg" or "jpeg" (no query strings either) (to change, override the `replaceUrl` function)
- Attribute search/replace limits to these tags: *&lt;img&gt;*, *&lt;source&gt;*, *&lt;input&gt;* and *&lt;iframe&gt;* (to change, override the `$searchInTags` property)
- Attribute search/replace limits to these attributes: "src", "src-set" and any attribute starting with "data-" (to change, override the `attributeFilter` function)
- Urls inside styles are replaced too (*background-image* and *background* properties)
The behaviour can be modified by extending *ImageUrlReplacer* and overriding public methods such as *replaceUrl*
ImageUrlReplacer uses the `Sunra\PhpSimple\HtmlDomParser`[library](https://github.com/sunra/php-simple-html-dom-parser) for parsing and modifying HTML. It wraps [simplehtmldom](http://simplehtmldom.sourceforge.net/). Simplehtmldom supports invalid HTML (it does not touch the invalid parts)
### Example: Customized behaviour
```php
class ImageUrlReplacerCustomReplacer extends ImageUrlReplacer
{
public function replaceUrl($url) {
// Only accept urls ending with "png", "jpg", "jpeg" and "gif"
if (!preg_match('#(png|jpe?g|gif)$#', $url)) {
return;
}
// Only accept full urls (beginning with http:// or https://)
if (!preg_match('#^https?://#', $url)) {
return;
}
// PS: You probably want to filter out external images too...
// Simply append ".webp" after current extension.
// This strategy ensures that "logo.jpg" and "logo.gif" gets counterparts with unique names
return $url . '.webp';
}
public function attributeFilter($attrName) {
// Don't allow any "data-" attribute, but limit to attributes that smells like they are used for images
// The following rule matches all attributes used for lazy loading images that we know of
return preg_match('#^(src|srcset|(data-[^=]*(lazy|small|slide|img|large|src|thumb|source|set|bg-url)[^=]*))$#i', $attrName);
// If you want to limit it further, only allowing attributes known to be used for lazy load,
// use the following regex instead:
//return preg_match('#^(src|srcset|data-(src|srcset|cvpsrc|cvpset|thumb|bg-url|large_image|lazyload|source-url|srcsmall|srclarge|srcfull|slide-img|lazy-original))$#i', $attrName);
}
}
$modifiedHtml = ImageUrlReplacerCustomReplacer::replace($html);
```
## 2. Replacing *&lt;img&gt;* tags with *&lt;picture&gt;* tags
The *PictureTags::replace($html)* method accepts a piece of HTML and returns HTML where where all &lt;img&gt; tags have been replaced with &lt;picture&gt; tags, adding webp versions to sources
Usage:
```php
$modifiedHtml = PictureTags::replace($html);
```
#### Example replacements:
*Input:*
```html
<img src="1.png">
<img srcset="3.jpg 1000w" src="3.jpg">
<img data-lazy-src="9.jpg" style="border:2px solid red" class="something">
<figure class="wp-block-image">
<img src="12.jpg" alt="" class="wp-image-6" srcset="12.jpg 492w, 12-300x265.jpg 300w" sizes="(max-width: 492px) 100vw, 492px">
</figure>
```
*Output*:
```html
<picture><source srcset="1.png.webp" type="image/webp"><img src="1.png" class="webpexpress-processed"></picture>
<picture><source srcset="3.jpg.webp 1000w" type="image/webp"><img srcset="3.jpg 1000w" src="3.jpg" class="webpexpress-processed"></picture>
<picture><source data-lazy-src="9.jpg.webp" type="image/webp"><img data-lazy-src="9.jpg" style="border:2px solid red" class="something webpexpress-processed"></picture>
<figure class="wp-block-image">
<picture><source srcset="12.jpg.webp 492w, 12-300x265.jpg.webp 300w" sizes="(max-width: 492px) 100vw, 492px" type="image/webp"><img src="12.jpg" alt="" class="wp-image-6 webpexpress-processed" srcset="12.jpg 492w, 12-300x265.jpg 300w" sizes="(max-width: 492px) 100vw, 492px"></picture>
</figure>'
```
Note that with the picture tags, it is still the img tag that shows the selected image. The picture tag is just a wrapper.
So it is correct behaviour not to copy the *style*, *width*, *class* or any other attributes to the picture tag. See [issue #9](https://github.com/rosell-dk/dom-util-for-webp/issues/9).
As with `ImageUrlReplacer`, you can override the *replaceUrl* function. There is however currently no other methods to override.
`PictureTags` currently uses regular expressions to do the replacing. There are plans to change implementation to use `Sunra\PhpSimple\HtmlDomParser`, like our `ImageUrlReplacer` class does.
## Platforms
Works on (at least):
- OS: Ubuntu (22.04, 20.04, 18.04), Windows (2022, 2019), Mac OS (13, 12, 11, 10.15)
- PHP: 5.6 - 8.2 (also tested 8.3 and 8.4 development versions in October 2023)
Each new release will be tested on all combinations of OSs and PHP versions that are [supported](https://github.com/marketplace/actions/setup-php-action) by GitHub-hosted runners. Except that we do not below PHP 5.6.\
Status: [![Build Status](https://img.shields.io/github/actions/workflow/status/rosell-dk/dom-util-for-webp/release.yml?branch=master&logo=GitHub&style=flat-square&label=Giant%20test)](https://github.com/rosell-dk/dom-util-for-webp/actions/workflows/release.yml)
Testing consists of running the unit tests. The code in this library is almost completely covered by tests (~95% coverage).
We also test future versions of PHP monthly, in order to catch problems early.\
Status:
[![PHP 8.3](https://img.shields.io/github/actions/workflow/status/rosell-dk/dom-util-for-webp/php83.yml?branch=master&logo=GitHub&style=flat-square&label=PHP%208.3)](https://github.com/rosell-dk/dom-util-for-webp/actions/workflows/php83.yml)
[![PHP 8.4](https://img.shields.io/github/actions/workflow/status/rosell-dk/dom-util-for-webp/php84.yml?branch=master&logo=GitHub&style=flat-square&label=PHP%208.4)](https://github.com/rosell-dk/dom-util-for-webp/actions/workflows/php84.yml)
## Do you like what I do?
Perhaps you want to support my work, so I can continue doing it :)
- [Become a backer or sponsor on Patreon](https://www.patreon.com/rosell).
- [Buy me a Coffee](https://ko-fi.com/rosell)

View File

@@ -0,0 +1,66 @@
{
"name": "rosell-dk/dom-util-for-webp",
"description": "Replace image URLs found in HTML",
"type": "library",
"license": "MIT",
"minimum-stability": "stable",
"keywords": ["webp", "replace", "images", "html"],
"scripts": {
"ci": [
"@build",
"@test-cov-console",
"@phpcs-all",
"@composer validate --no-check-all --strict",
"@phpstan"
],
"cs-fix-all": [
"php-cs-fixer fix src"
],
"cs-fix": "php-cs-fixer fix",
"cs-dry": "php-cs-fixer fix --dry-run --diff",
"test": "phpunit --coverage-text=build/coverage.txt --coverage-clover=build/coverage.clover --coverage-html=build/coverage --whitelist=src tests",
"test-cov-console": "phpunit --coverage-text --whitelist=src tests",
"test-41": "phpunit --coverage-text --configuration 'phpunit-41.xml.dist'",
"test-no-cov": "phpunit --no-coverage tests",
"phpunit": "phpunit --no-coverage",
"phpcs": "phpcs --standard=phpcs-ruleset.xml",
"phpcs-all": "phpcs --standard=phpcs-ruleset.xml src",
"phpcbf": "phpcbf --standard=phpcs-ruleset.xml",
"phpstan": "vendor/bin/phpstan analyse src --level=4"
},
"extra": {
"scripts-descriptions": {
"ci": "Run tests before CI",
"phpcs": "Checks coding styles (PSR2) of file/dir, which you must supply. To check all, supply 'src'",
"phpcbf": "Fix coding styles (PSR2) of file/dir, which you must supply. To fix all, supply 'src'",
"cs-fix-all": "Fix the coding style of all the source files, to comply with the PSR-2 coding standard",
"cs-fix": "Fix the coding style of a PHP file or directory, which you must specify.",
"test": "Launches the preconfigured PHPUnit"
}
},
"autoload": {
"psr-4": { "DOMUtilForWebP\\": "src/" }
},
"autoload-dev": {
"psr-4": { "DOMUtilForWebPTests\\": "tests/" }
},
"authors": [
{
"name": "Bjørn Rosell",
"homepage": "https://www.bitwise-it.dk/contact",
"role": "Project Author"
}
],
"require-dev": {
"friendsofphp/php-cs-fixer": "^2.11",
"phpstan/phpstan": "^1.5",
"phpunit/phpunit": "^9.3",
"squizlabs/php_codesniffer": "3.*"
},
"config": {
"sort-packages": true
},
"require": {
"kub-at/php-simple-html-dom-parser": "^1.9"
}
}

View File

@@ -0,0 +1,43 @@
# Development
## Setting up the environment.
First, clone the repository:
```
cd whatever/folder/you/want
git clone git@github.com:rosell-dk/dom-util-for-webp.git
```
Then install the dev tools with composer:
```
composer install
```
If you don't have composer yet:
- Get it ([download phar](https://getcomposer.org/composer.phar) and move it to /usr/local/bin/composer)
- PS: PHPUnit requires php-xml, php-mbstring and php-curl. To install: `sudo apt install php-xml php-mbstring curl php-curl`
Make sure you have [xdebug](https://xdebug.org/docs/install) installed, if you want phpunit tog generate code coverage report
## Unit Testing
To run all the unit tests do this:
```
composer test
```
This also runs tests on the builds.
If you do not the coverage report:
```
composer phpunit
```
Individual test files can be executed like this:
```
composer phpunit tests/ImageUrlReplacerTest.php
composer phpunit tests/PictureTagsTest.php
```
Note:
The code coverage requires [xdebug](https://xdebug.org/docs/install)

View File

@@ -0,0 +1,8 @@
<?xml version="1.0"?>
<ruleset name="Custom Standard">
<description>PSR2 without line ending rule - let git manage the EOL cross the platforms</description>
<rule ref="PSR2" />
<rule ref="Generic.Files.LineEndings">
<exclude name="Generic.Files.LineEndings.InvalidEOLChar"/>
</rule>
</ruleset>

View File

@@ -0,0 +1,38 @@
<?xml version="1.0" encoding="UTF-8"?>
<phpunit xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://schema.phpunit.de/4.1/phpunit.xsd"
backupGlobals="false"
backupStaticAttributes="false"
colors="true"
convertErrorsToExceptions="true"
convertNoticesToExceptions="true"
convertWarningsToExceptions="false"
processIsolation="false"
stopOnFailure="false"
bootstrap="vendor/autoload.php"
>
<testsuites>
<testsuite name="Dom util for WebP Test Suite">
<directory>./tests/</directory>
</testsuite>
</testsuites>
<filter>
<whitelist>
<directory suffix=".php">src/</directory>
<exclude>
<directory>./vendor</directory>
<directory>./tests</directory>
</exclude>
</whitelist>
</filter>
<logging>
<log type="junit" target="build/report.junit.xml"/>
<log type="coverage-clover" target="build/logs/clover.xml"/>
<log type="coverage-text" target="build/coverage.txt"/>
<!--<log type="coverage-html" target="build/coverage"/>-->
</logging>
</phpunit>

View File

@@ -0,0 +1,21 @@
<?xml version="1.0" encoding="UTF-8"?>
<phpunit
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="https://schema.phpunit.de/9.3/phpunit.xsd"
backupGlobals="false"
backupStaticAttributes="false"
colors="true"
convertErrorsToExceptions="true"
convertNoticesToExceptions="true"
convertWarningsToExceptions="true"
convertDeprecationsToExceptions="true"
processIsolation="true"
stopOnFailure="false"
bootstrap="vendor/autoload.php"
failOnWarning="true"
failOnRisky="false">
<testsuites>
<testsuite name="Dom util for WebP Test Suite">
<directory>./tests/</directory>
</testsuite>
</testsuites>
</phpunit>

View File

@@ -0,0 +1,247 @@
<?php
namespace DOMUtilForWebP;
//use Sunra\PhpSimple\HtmlDomParser;
use KubAT\PhpSimple\HtmlDomParser;
/**
* Highly configurable class for replacing image URLs in HTML (both src and srcset syntax)
*
* Uses http://simplehtmldom.sourceforge.net/ - a library for easily manipulating HTML by means of a DOM.
* The great thing about this library is that it supports working on invalid HTML and it only applies the changes you
* make - very gently (however, not as gently as we do in PictureTags).
* PS: The library is a bit old, so perhaps we should look for another.
* ie https://packagist.org/packages/masterminds/html5 ??
*
* Behaviour can be customized by overriding the public methods (replaceUrl, $searchInTags, etc)
*
* Default behaviour:
* - The modified URL is the same as the original, with ".webp" appended (replaceUrl)
* - Limits to these tags: <img>, <source>, <input> and <iframe> ($searchInTags)
* - Limits to these attributes: "src", "src-set" and any attribute starting with "data-" (attributeFilter)
* - Only replaces URLs that ends with "png", "jpg" or "jpeg" (no query strings either) (replaceUrl)
*
*
*/
class ImageUrlReplacer
{
// define tags to be searched.
// The div and li are on the list because these are often used with lazy loading
// should we add <meta> ?
// Probably not for open graph images or twitter
// so not these:
// - <meta property="og:image" content="[url]">
// - <meta property="og:image:secure_url" content="[url]">
// - <meta name="twitter:image" content="[url]">
// Meta can also be used in schema.org micro-formatting, ie:
// - <meta itemprop="image" content="[url]">
//
// How about preloaded images? - yes, suppose we should replace those
// - <link rel="prefetch" href="[url]">
// - <link rel="preload" as="image" href="[url]">
public static $searchInTags = ['img', 'source', 'input', 'iframe', 'div', 'li', 'link', 'a', 'section', 'video'];
/**
* Empty constructor for preventing child classes from creating constructors.
*
* We do this because otherwise the "new static()" call inside the ::replace() method
* would be unsafe. See #21
* @return void
*/
final public function __construct()
{
}
/**
*
* @return string|null webp url or, if URL should not be changed, return nothing
**/
public function replaceUrl($url)
{
if (!preg_match('#(png|jpe?g)$#', $url)) {
return null;
}
return $url . '.webp';
}
public function replaceUrlOr($url, $returnValueIfDenied)
{
$url = $this->replaceUrl($url);
return (isset($url) ? $url : $returnValueIfDenied);
}
/*
public function isValidUrl($url)
{
return preg_match('#(png|jpe?g)$#', $url);
}*/
public function handleSrc($attrValue)
{
return $this->replaceUrlOr($attrValue, $attrValue);
}
public function handleSrcSet($attrValue)
{
// $attrValue is ie: <img data-x="1.jpg 1000w, 2.jpg">
$srcsetArr = explode(',', $attrValue);
foreach ($srcsetArr as $i => $srcSetEntry) {
// $srcSetEntry is ie "image.jpg 520w", but can also lack width, ie just "image.jpg"
// it can also be ie "image.jpg 2x"
$srcSetEntry = trim($srcSetEntry);
$entryParts = preg_split('/\s+/', $srcSetEntry, 2);
if (count($entryParts) == 2) {
list($src, $descriptors) = $entryParts;
} else {
$src = $srcSetEntry;
$descriptors = null;
}
$webpUrl = $this->replaceUrlOr($src, false);
if ($webpUrl !== false) {
$srcsetArr[$i] = $webpUrl . (isset($descriptors) ? ' ' . $descriptors : '');
}
}
return implode(', ', $srcsetArr);
}
/**
* Test if attribute value looks like it has srcset syntax.
* "image.jpg 100w" does for example. And "image.jpg 1x". Also "image1.jpg, image2.jpg 1x"
* Mixing x and w is invalid (according to
* https://stackoverflow.com/questions/26928828/html5-srcset-mixing-x-and-w-syntax)
* But we accept it anyway
* It is not the job of this function to see if the first part is an image URL
* That will be done in handleSrcSet.
*
*/
public function looksLikeSrcSet($value)
{
if (preg_match('#\s\d*(w|x)#', $value)) {
return true;
}
return false;
}
public function handleAttribute($value)
{
if (self::looksLikeSrcSet($value)) {
return self::handleSrcSet($value);
}
return self::handleSrc($value);
}
public function attributeFilter($attrName)
{
$attrName = strtolower($attrName);
if (($attrName == 'src') || ($attrName == 'srcset') || (strpos($attrName, 'data-') === 0)) {
return true;
}
return false;
}
public function processCSSRegExCallback($matches)
{
list($all, $pre, $quote, $url, $post) = $matches;
return $pre . $this->replaceUrlOr($url, $url) . $post;
}
public function processCSS($css)
{
$declarations = explode(';', $css);
foreach ($declarations as $i => &$declaration) {
if (preg_match('#(background(-image)?)\\s*:#', $declaration)) {
// https://regexr.com/46qdg
//$regex = '#(url\s*\(([\"\']?))([^\'\";\)]*)(\2\s*\))#';
$parts = explode(',', $declaration);
//print_r($parts);
foreach ($parts as &$part) {
//echo 'part:' . $part . "\n";
$regex = '#(url\\s*\\(([\\"\\\']?))([^\\\'\\";\\)]*)(\\2\\s*\\))#';
$part = preg_replace_callback(
$regex,
'\DOMUtilForWebP\ImageUrlReplacer::processCSSRegExCallback',
$part
);
//echo 'result:' . $part . "\n";
}
$declarations[$i] = implode(',', $parts);
}
}
return implode(';', $declarations);
}
public function replaceHtml($html)
{
if ($html == '') {
return '';
}
// https://stackoverflow.com/questions/4812691/preserve-line-breaks-simple-html-dom-parser
// function str_get_html($str, $lowercase=true, $forceTagsClosed=true, $target_charset = DEFAULT_TARGET_CHARSET,
// $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT)
$dom = HtmlDomParser::str_get_html($html, false, true, 'UTF-8', false);
//$dom = str_get_html($html, false, false, 'UTF-8', false);
// MAX_FILE_SIZE is defined in simple_html_dom.
// For safety sake, we make sure it is defined before using
defined('MAX_FILE_SIZE') || define('MAX_FILE_SIZE', 600000);
if ($dom === false) {
if (strlen($html) > MAX_FILE_SIZE) {
return '<!-- Alter HTML was skipped because the HTML is too big to process! ' .
'(limit is set to ' . MAX_FILE_SIZE . ' bytes) -->' . "\n" . $html;
}
return '<!-- Alter HTML was skipped because the helper library refused to process the html -->' .
"\n" . $html;
}
// Replace attributes (src, srcset, data-src, etc)
foreach (self::$searchInTags as $tagName) {
$elems = $dom->find($tagName);
foreach ($elems as $index => $elem) {
$attributes = $elem->getAllAttributes();
foreach ($elem->getAllAttributes() as $attrName => $attrValue) {
if ($this->attributeFilter($attrName)) {
$elem->setAttribute($attrName, $this->handleAttribute($attrValue));
}
}
}
}
// Replace <style> elements
$elems = $dom->find('style');
foreach ($elems as $index => $elem) {
$css = $this->processCSS($elem->innertext);
if ($css != $elem->innertext) {
$elem->innertext = $css;
}
}
// Replace "style attributes
$elems = $dom->find('*[style]');
foreach ($elems as $index => $elem) {
$css = $this->processCSS($elem->style);
if ($css != $elem->style) {
$elem->style = $css;
}
}
return $dom->save();
}
/* Main replacer function */
public static function replace($html)
{
/*if (!function_exists('str_get_html')) {
require_once __DIR__ . '/../src-vendor/simple_html_dom/simple_html_dom.inc';
}*/
$iur = new static();
return $iur->replaceHtml($html);
}
}

View File

@@ -0,0 +1,337 @@
<?php
namespace DOMUtilForWebP;
//use Sunra\PhpSimple\HtmlDomParser;
use KubAT\PhpSimple\HtmlDomParser;
/**
* Class PictureTags - convert an <img> tag to a <picture> tag and add the webp versions of the images
* Code is based on code from the ShortPixel plugin, which in turn used code from Responsify WP plugin
*
* It works like this:
*
* 1. Remove existing <picture> tags and their content - replace with tokens in order to reinsert later
* 2. Process <img> tags.
* - The tags are found with regex.
* - The attributes are parsed with DOMDocument if it exists, otherwise with the Simple Html Dom library,
* which is included inside this library
* 3. Re-insert the existing <picture> tags
*
* This procedure is very gentle and needle-like. No need for a complete parse - so invalid HTML is no big issue
*
* PS:
* https://packagist.org/packages/masterminds/html5
*/
class PictureTags
{
/**
* Empty constructor for preventing child classes from creating constructors.
*
* We do this because otherwise the "new static()" call inside the ::replace() method
* would be unsafe. See #21
* @return void
*/
final public function __construct()
{
$this->existingPictureTags = [];
}
private $existingPictureTags;
public function replaceUrl($url)
{
if (!preg_match('#(png|jpe?g)$#', $url)) {
return;
}
return $url . '.webp';
}
public function replaceUrlOr($url, $returnValueIfDenied)
{
$url = $this->replaceUrl($url);
return (isset($url) ? $url : $returnValueIfDenied);
}
/**
* Look for attributes such as "data-lazy-src" and "data-src" and prefer them over "src"
*
* @param array $attributes an array of attributes for the element
* @param string $attrName ie "src", "srcset" or "sizes"
*
* @return array an array with "value" key and "attrName" key. ("value" is the value of the attribute and
* "attrName" is the name of the attribute used)
*
*/
private static function lazyGet($attributes, $attrName)
{
return array(
'value' =>
(isset($attributes['data-lazy-' . $attrName]) && strlen($attributes['data-lazy-' . $attrName])) ?
trim($attributes['data-lazy-' . $attrName])
: (isset($attributes['data-' . $attrName]) && strlen($attributes['data-' . $attrName]) ?
trim($attributes['data-' . $attrName])
: (isset($attributes[$attrName]) && strlen($attributes[$attrName]) ?
trim($attributes[$attrName]) : false)),
'attrName' =>
(isset($attributes['data-lazy-' . $attrName]) && strlen($attributes['data-lazy-' . $attrName])) ?
'data-lazy-' . $attrName
: (isset($attributes['data-' . $attrName]) && strlen($attributes['data-' . $attrName]) ?
'data-' . $attrName
: (isset($attributes[$attrName]) && strlen($attributes[$attrName]) ? $attrName : false))
);
}
/**
* Look for attribute such as "src", but also with prefixes such as "data-lazy-src" and "data-src"
*
* @param array $attributes an array of all attributes for the element
* @param string $attrName ie "src", "srcset" or "sizes"
*
* @return array an array with "value" key and "attrName" key. ("value" is the value of the attribute and
* "attrName" is the name of the attribute used)
*
*/
private static function findAttributesWithNameOrPrefixed($attributes, $attrName)
{
$tryThesePrefixes = ['', 'data-lazy-', 'data-'];
$result = [];
foreach ($tryThesePrefixes as $prefix) {
$name = $prefix . $attrName;
if (isset($attributes[$name]) && strlen($attributes[$name])) {
/*$result[] = [
'value' => trim($attributes[$name]),
'attrName' => $name,
];*/
$result[$name] = trim($attributes[$name]);
}
}
return $result;
}
/**
* Convert to UTF-8 and encode chars outside of ascii-range
*
* Input: html that might be in any character encoding and might contain non-ascii characters
* Output: html in UTF-8 encding, where non-ascii characters are encoded
*
*/
private static function textToUTF8WithNonAsciiEncoded($html)
{
if (function_exists("mb_convert_encoding")) {
$html = mb_convert_encoding($html, 'UTF-8');
$html = mb_encode_numericentity($html, array (0x7f, 0xffff, 0, 0xffff), 'UTF-8');
}
return $html;
}
private static function getAttributes($html)
{
if (class_exists('\\DOMDocument')) {
$dom = new \DOMDocument();
if (function_exists("mb_encode_numericentity")) {
// I'm in doubt if I should add the following line (see #41)
// $html = mb_convert_encoding($html, 'UTF-8');
$html = mb_encode_numericentity($html, array (0x7f, 0xffff, 0, 0xffff)); // #41
}
@$dom->loadHTML($html);
$image = $dom->getElementsByTagName('img')->item(0);
$attributes = [];
foreach ($image->attributes as $attr) {
$attributes[$attr->nodeName] = $attr->nodeValue;
}
return $attributes;
} else {
// Convert to UTF-8 because HtmlDomParser::str_get_html needs to be told the
// encoding. As UTF-8 might conflict with the charset set in the meta, we must
// encode all characters outside the ascii-range.
// It would perhaps have been better to try to guess the encoding rather than
// changing it (see #39), but I'm reluctant to introduce changes.
$html = self::textToUTF8WithNonAsciiEncoded($html);
$dom = HtmlDomParser::str_get_html($html, false, true, 'UTF-8', false);
if ($dom !== false) {
$elems = $dom->find('img,IMG');
foreach ($elems as $index => $elem) {
$attributes = [];
foreach ($elem->getAllAttributes() as $attrName => $attrValue) {
$attributes[strtolower($attrName)] = $attrValue;
}
return $attributes;
}
}
return [];
}
}
/**
* Makes a string with all attributes.
*
* @param array $attribute_array
* @return string
*/
private static function createAttributes($attribute_array)
{
$attributes = '';
foreach ($attribute_array as $attribute => $value) {
$attributes .= $attribute . '="' . $value . '" ';
}
if ($attributes == '') {
return '';
}
// Removes the extra space after the last attribute. Add space before
return ' ' . substr($attributes, 0, -1);
}
/**
* Replace <img> tag with <picture> tag.
*/
private function replaceCallback($match)
{
$imgTag = $match[0];
// Do nothing with images that have the 'webpexpress-processed' class.
if (strpos($imgTag, 'webpexpress-processed')) {
return $imgTag;
}
$imgAttributes = self::getAttributes($imgTag);
$srcInfo = self::lazyGet($imgAttributes, 'src');
$srcsetInfo = self::lazyGet($imgAttributes, 'srcset');
$sizesInfo = self::lazyGet($imgAttributes, 'sizes');
$srcSetAttributes = self::findAttributesWithNameOrPrefixed($imgAttributes, 'srcset');
$srcAttributes = self::findAttributesWithNameOrPrefixed($imgAttributes, 'src');
if ((!isset($srcSetAttributes['srcset'])) && (!isset($srcAttributes['src']))) {
// better not mess with this html...
return $imgTag;
}
// add the exclude class so if this content is processed again in other filter,
// the img is not converted again in picture
$imgAttributes['class'] = (isset($imgAttributes['class']) ? $imgAttributes['class'] . " " : "") .
"webpexpress-processed";
// Process srcset (also data-srcset etc)
$atLeastOneWebp = false;
$sourceTagAttributes = [];
foreach ($srcSetAttributes as $attrName => $attrValue) {
$srcsetArr = explode(', ', $attrValue);
$srcsetArrWebP = [];
foreach ($srcsetArr as $i => $srcSetEntry) {
// $srcSetEntry is ie "http://example.com/image.jpg 520w"
$result = preg_split('/\s+/', trim($srcSetEntry));
$src = trim($srcSetEntry);
$width = null;
if ($result && count($result) >= 2) {
list($src, $width) = $result;
}
$webpUrl = $this->replaceUrlOr($src, false);
if ($webpUrl == false) {
// We want ALL of the sizes as webp.
// If we cannot have that, it is better to abort! - See #42
return $imgTag;
} else {
if (substr($src, 0, 5) != 'data:') {
$atLeastOneWebp = true;
$srcsetArrWebP[] = $webpUrl . (isset($width) ? ' ' . $width : '');
}
}
}
$sourceTagAttributes[$attrName] = implode(', ', $srcsetArrWebP);
}
foreach ($srcAttributes as $attrName => $attrValue) {
if (substr($attrValue, 0, 5) == 'data:') {
// ignore tags with data urls, such as <img src="data:...
return $imgTag;
}
// Make sure not to override existing srcset with src
if (!isset($sourceTagAttributes[$attrName . 'set'])) {
$srcWebP = $this->replaceUrlOr($attrValue, false);
if ($srcWebP !== false) {
$atLeastOneWebp = true;
}
$sourceTagAttributes[$attrName . 'set'] = $srcWebP;
}
}
if ($sizesInfo['value']) {
$sourceTagAttributes[$sizesInfo['attrName']] = $sizesInfo['value'];
}
if (!$atLeastOneWebp) {
// We have no webps for you, so no reason to create <picture> tag
return $imgTag;
}
return '<picture>'
. '<source' . self::createAttributes($sourceTagAttributes) . ' type="image/webp">'
. '<img' . self::createAttributes($imgAttributes) . '>'
. '</picture>';
}
/*
*
*/
public function removePictureTagsTemporarily($content)
{
//print_r($content);
$this->existingPictureTags[] = $content[0];
return 'PICTURE_TAG_' . (count($this->existingPictureTags) - 1) . '_';
}
/*
*
*/
public function insertPictureTagsBack($content)
{
$numberString = $content[1];
$numberInt = intval($numberString);
return $this->existingPictureTags[$numberInt];
}
/**
*
*/
public function replaceHtml($content)
{
if (!class_exists('\\DOMDocument') && function_exists('mb_detect_encoding')) {
// PS: Correctly identifying Windows-1251 encoding only works on some systems
// But at least I'm not aware of any false positives
if (mb_detect_encoding($content, ["ASCII", "UTF8", "Windows-1251"]) == 'Windows-1251') {
$content = mb_convert_encoding($content, 'UTF-8', 'Windows-1251');
}
}
$this->existingPictureTags = [];
// Tempororily remove existing <picture> tags
$content = preg_replace_callback(
'/<picture[^>]*>.*?<\/picture>/is',
array($this, 'removePictureTagsTemporarily'),
$content
);
// Replace "<img>" tags
$content = preg_replace_callback('/<img[^>]*>/i', array($this, 'replaceCallback'), $content);
// Re-insert <picture> tags that was removed
$content = preg_replace_callback('/PICTURE_TAG_(\d+)_/', array($this, 'insertPictureTagsBack'), $content);
return $content;
}
/* Main replacer function */
public static function replace($html)
{
$pt = new static();
return $pt->replaceHtml($html);
}
}