Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

Design Doc: Embedding Image settings in .pagespeed. URLs

Jeff Kaufman edited this page Jan 9, 2017 · 1 revision

Embedding Image (and other) settings in .pagespeed. URLs

Joshua Marantz, 2013-01-11

Currently mod_pagespeed’s multi-server model assumes that each server has the same configuration file. This model may break down in large-scale deployments. In particular, one very large company that would like to integrate mod_pagespeed cannot deploy this model.

This customer currently has an image-rewriting solution based on an internal tool which allows per-page image tuning. Unfortunately, the images are not organized per-page directly in the image-serving cluster, so we cannot replicate the settings via htaccess files; we must encode the tuning in the image URLs.

The good news is that PSOL already implements a mechanism to tune image settings per URL: the query-params adjusting filter-set and compression. However, the current system has two drawbacks:

  • The set of options that can be tuned by query-param is limited
  • The syntax is quite verbose; it’s intended for debugging and experimentation.

It has one notable feature, though, the options we expose currently as query-params are ones we feel do not create a significant attack vector, at least any more than the URL syntax itself does.

To resolve this, we propose a more concise and complete syntax, but where we still apply whitelisting of specific options, to avoid having security be an issue by default. Our strategy is to use the two-letter codes we now have for every filter, and the 2-4 letter codes we now have for every Option, along with the standard string-conversion API available on every option.

To minimize the noise, each filter-class will define a set of related filters & options that are relevant to the URLs it rewrites. Only the non-default settings for those filters will be included, to reduce the amount of noise. For example the query-param

?PsolOpt=gp,jw,pj,rj,rp,rw,iq:73

enables 7 different image-related filters and sets jpeg quality at 73 per the codes defined in rewrite_options.cc. Note that the options include only image-related settings.

Adding these query-params makes pages bigger and thus this option must be default-off. It would be enabled via pagespeed.conf setting "ModPagespeedAddOptionsToUrls"

We may choose to get sophisticated about what options we include in the query-param. For example, if an option is defined in a VirtualHost but then overridden in an .htaccess file, we might only include the delta from the .htaccess file in the query-param. This would help reduce the size of the query-params.

Other Options Considered

Encoding RewriteOptions in protobuf, serializing to binary & base64-encoding. Probably this would be more compact. But it would be completely impossible to understand & debug. The current proposal is not exactly user-friendly but it’s decodable with the naked eye, which I view as a benefit.

Using the RewriteOptions->signature(): this is not reversible. We need to be able to read the options, construct a new RewriteOptions object, and apply that to the rewrite.

Implementation

First CL in review, that covers images only.

Drawbacks

"Most proxies, most notably Squid up through version 3.0, do not cache resources with a "?" in their URL even if a Cache-control: public header is present in the response. To enable proxy caching for these resources, remove query strings from references to static resources, and instead encode the parameters into the file names themselves."

See https://developers.google.com/speed/docs/best-practices/caching#LeverageProxyCaching

Steve has a post on the subject as well: http://www.stevesouders.com/blog/2008/08/23/revving-filenames-dont-use-querystring/

Complexities

We would like to make sure the metadata cache entry written in the HTML flow to memcached would be available when serving resources. This will only be needed if the rewritten resource itself is evicted from cache. If that does occur, the MD-cache lookup from the resource fetch would likely fail because the RewriteOptions we construct for a resource will have only the related options for the resource, and thus the signature will not match the one we write in the HTML flow, which will include all the options. We should tweak the options-signature generation when this feature is enabled to use only the related-options for the filter.

Follow-ups

  1. Clearly the MD-cache key issue needs to be resolved.

  2. It won’t be that bad to encode these options within the .pagespeed. URL. The parsing/generation of that is purely in ResourceNamer, which can accommodate incremental complexity. One thought, expand the current syntax of ENCODING.pagespeed.ID.HASH.EXT so that the "ID" can have an optional suffix id[:psol_opts]. For example: ximage.jpg.pagespeed.ic+gp,jw,pj,rj,rp,rw,iq=73. In this scenario, ResourceNamer::id() would just return “ic” so that largely the PSOL code can remain unchanged. But there would be an additional method ResourceNamer::config() which would return “gp,jw,pj,rj,rp,rw,iq=73” or an empty string.

  3. We should pass through query-params specified for HTML to the resource request. Today, if I navigate to example.com/foo.html?ModPagespeedJpegRecompressionQuality=73 the image URLs will not carry that query-parameter and we’ll have the same problems this proposal is trying to fix, even for sites that have consistent configuration. We can simply carry forward any such query-parameters verbatim.

Clone this wiki locally