PHP Domain Parser is a Public Suffix List based domain parser implemented in PHP.
While there are plenty of excellent URL parsers and builders available, there are very few projects that can accurately parse a domaine into its component subdomain, registrable domain, and public suffix parts.
Consider the domain www.pref.okinawa.jp. In this domain, the public suffix portion is okinawa.jp, the registrable domain is pref.okinawa.jp, and the subdomain is www. You can't regex that.
PHP Domain Parser is compliant around:
- accurate Public Suffix List based parsing.
- accurate Root Zone Database parsing.
$ composer require jeremykendall/php-domain-parser
You need:
- PHP >= 7.4 but the latest stable version of PHP is recommended
- the
intl
extension
The first objective of the library is to use the Public Suffix List to easily resolve a
domain as a Pdp\ResolvedDomain
object using the following methods:
<?php
use Pdp\Rules;
$rules = Rules::fromPath('/path/to/cache/public-suffix-list.dat');
$resolvedDomain = $rules->resolve('www.PreF.OkiNawA.jP');
echo $resolvedDomain->getDomain(); //display 'www.pref.okinawa.jp';
echo $resolvedDomain->getPublicSuffix(); //display 'okinawa.jp';
echo $resolvedDomain->getSecondLevelDomain(); //display 'pref';
echo $resolvedDomain->getRegistrableDomain(); //display 'pref.okinawa.jp';
echo $resolvedDomain->getSubDomain(); //display 'www';
In case of an error an exception which extends Pdp\CannotProcessHost
is thrown.
The Public Suffix List is organized in sections. This library can give you access to this information via its public suffix object.
<?php
use Pdp\Rules;
$rules = Rules::fromPath('/path/to/cache/public-suffix-list.dat');
$publicSuffix = $rules->resolve('example.github.io')->getPublicSuffix();
echo $publicSuffix; // display 'github.io';
$publicSuffix->isICANN(); // will return false
$publicSuffix->isPrivate(); // will return true
$publicSuffix->isKnown(); // will return true
Because the PSL algorithm is fault tolerant this library exposes more strict methods
Rules::getCookieDomain
Rules::getICANNDomain
Rules::getPrivateDomain
These methods act and resolve the domain against the PSL just like the resolve
method but will throw
an exception if no valid effective TLD is found in the respective PSL section or if the submitted domain is invalid.
<?php
use Pdp\Rules;
$rules = Rules::fromPath('/path/to/cache/public-suffix-list.dat');
$rules->getICANNDomain('qfdsf.unknownTLD');
// will trigger an UnableToResolveDomain exception because `.unknownTLD` is not
// part of the ICANN section
$rules->getCookieDomain('qfdsf.unknownTLD');
// will not throw because the domain syntax is correct.
$rules->getCookieDomain('com');
// will throw because no public suffix can be determined
$rules->resolve('com');
// will return a Nullable Resolved domain
From the ResolvedDomain
you can access the underlying domain object using the ResolvedDomain::getDomain
method.
Accessing this object enables you to work with the domain labels.
WARNING: all objects are immutable, modifying the underlying object will not affect the parent object.
<?php
use Pdp\Rules;
$rules = Rules::fromPath('/path/to/cache/public-suffix-list.dat');
$resolvedDomain = $rules->resolve('www.example.com');
$domain = $resolvedDomain->getDomain();
$domain->labels(); // returns ['com', 'example', 'www'];
$domain->label(-1); // returns 'www'
$domain->label(0); // returns 'com'
foreach ($domain as $label) {
echo $label, PHP_EOL;
}
// display
// com
// example
// www
You can also add or remove labels according to their key index using the following methods:
Domain::withLabel(int $key, string|Stringable $label): self;
Domain::withoutLabel(int $key, int ...$keys): self;
Domain::append(string|Stringable $label): self;
Domain::prepend(string|Stringable $label): self;
While the Public Suffix List is a community based list, the package provides access to the Top Level domain information given by the IANA website to always resolve top domain against all registered TLD even the new ones.
use Pdp\TopLevelDomains;
$iana = TopLevelDomains::fromPath('/path/to/cache/tlds-alpha-by-domain.txt');
$resolvedDomain = $iana->resolve('www.PreF.OkiNawA.jP');
echo $resolvedDomain->getDomain(); //display 'www.pref.okinawa.jp';
echo $resolvedDomain->getPublicSuffix(); //display 'jp';
echo $resolvedDomain->getSecondLevelDomain(); //display 'okinawa';
echo $resolvedDomain->getRegistrableDomain(); //display 'okinawa.jp';
echo $resolvedDomain->getSubDomain(); //display 'www.pref';
In case of an error an exception which extends Pdp\CannotProcessHost
is thrown.
WARNING:
You should never use the library this way in production, without, at least, a caching mechanism to reduce PSL downloads.
Using the Public Suffix List to determine what is a valid domain name and what isn't is dangerous, particularly in these days when new gTLDs are arriving at a rapid pace. The DNS is the proper source for this information.
If you are looking to know the validity of a Top Level Domain, the IANA Root Zone Database is the proper source for this information.
If you must use this library for any of the above purposes, please consider integrating an update mechanism into your software.
Depending on your software the mechanism to store your database may differ, nevertheless, the library comes bundle with a optional service which enables resolving domain name without the constant network overhead of continuously downloading the remote databases.
The Pdp\Storage\PsrStorageFactory
enables returning storage instances that retrieve, convert and cache the Public Suffix List as well as the IANA Root Zone Database using
standard interfaces published by the PHP-FIG to improve its interoperability with any modern PHP codebase.
To work as intended, the Pdp\Storage\PsrStorageFactory
constructor requires:
When creating a new storage instance you will require:
- a
$cachePrefix
argument to optionally add a prefix to your cache index, default to the empty string'''
; - a
$ttl
argument if you need to set the default$ttl
, default tonull
to use the underlying caching default TTL;
The $ttl
argument can be:
- an
int
representing time in second (see PSR-16); - a
DateInterval
object (see PSR-16); - a
DateTimeInterface
object representing the date and time when the item should expire;
However, the package no longer provides any implementation of such interfaces are they are many robust implementations that can easily be found on packagist.org.
THIS IS THE RECOMMENDED WAY OF USING THE LIBRARY
For the purpose of this example we used:
- Guzzle as a PSR-18 implementation HTTP client
- The Symfony Cache Component to use a PSR-16 cache implementation
You could easily use other packages as long as they implement the required PSR interfaces.
<?php
use GuzzleHttp\Client;
use GuzzleHttp\Psr7\Request;
use Pdp\Storage\PsrStorageFactory;
use Psr\Http\Message\RequestFactoryInterface;
use Psr\Http\Message\RequestInterface;
use Symfony\Component\Cache\Adapter\FilesystemAdapter;
use Symfony\Component\Cache\Psr16Cache;
$cache = new Psr16Cache(new FilesystemAdapter('pdp', 3600, __DIR__.'/data'));
$client = new Client();
$requestFactory = new class implements RequestFactoryInterface {
public function createRequest(string $method, $uri): RequestInterface
{
return new Request($method, $uri);
}
};
$cachePrefix = 'pdp_';
$cacheTtl = new DateInterval('P1D');
$factory = new PsrStorageFactory($cache, $client, $requestFactory);
$pslStorage = $factory->createPublicSuffixListStorage($cachePrefix, $cacheTtl);
$rzdStorage = $factory->createRootZoneDatabaseStorage($cachePrefix, $cacheTtl);
$rules = $pslStorage->get(PsrStorageFactory::URL_PSL);
$tldDomains = $rzdStorage->get(PsrStorageFactory::URL_RZD);
It is important to always have an up to date Public Suffix List and Root Zone Database. This library no longer provide an out of the box script to do so as implementing such a job heavily depends on your application setup.
Please see CHANGELOG for more information about what has been changed since version 5.0.0 was released.
Contributions are welcome and will be fully credited. Please see CONTRIBUTING for details.
pdp-domain-parser
has:
- a PHPUnit test suite
- a coding style compliance test suite using PHP CS Fixer.
- a code analysis compliance test suite using PHPStan.
To run the tests, run the following command from the project folder.
$ composer test
If you discover any security related issues, please email [email protected] instead of using the issue tracker.
The MIT License (MIT). Please see License File for more information.
Portions of the Pdp\Converter
and Pdp\Rules
are derivative works of the PHP
registered-domain-libs.
Those parts of this codebase are heavily commented, and I've included a copy of
the Apache Software Foundation License 2.0 in this project.