You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently frontera passes an URL to URL_FINGERPRINT_FUNCTION which is already canonicalized by w3lib's canonicalize_url function. By changing the API to pass raw URL users will be able to use canonicalize_url options like remove_fragments=False (which can be desired e.g. for Splash), or swap canonicalize implementation altogether. This would be backwards incompatible, though if desired it can be made backwards compatible (use a different settings, etc).
The text was updated successfully, but these errors were encountered:
Currently frontera passes an URL to URL_FINGERPRINT_FUNCTION which is already canonicalized by w3lib's canonicalize_url function
Only if URL comes from Scrapy's link extractor with canonicalisation enabled. Also there is a create_request method which is used when adding new seeds, generating new urls in CS and discovery from sitemap. In other words this is not always true.
Currently frontera passes an URL to URL_FINGERPRINT_FUNCTION which is already canonicalized by w3lib's canonicalize_url function. By changing the API to pass raw URL users will be able to use canonicalize_url options like remove_fragments=False (which can be desired e.g. for Splash), or swap canonicalize implementation altogether. This would be backwards incompatible, though if desired it can be made backwards compatible (use a different settings, etc).
The text was updated successfully, but these errors were encountered: