Skip to content

constverum/ProxyBroker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ProxyBroker

ProxyBroker is an asynchronous finder public proxies on multiple sources and concurrently checks them (wtype, anonymity, country). Supports HTTP(S) and SOCKS!

https://raw.githubusercontent.com/constverum/ProxyBroker/master/proxybroker/data/example.gif

  • Finds proxies on 50+ sources (~7k working proxies)
  • Identifies proxy in raw input data
  • Checks proxies on working with protocols: HTTP, HTTPS, SOCKS4, SOCKS5
  • Checks the level of anonymity proxy
  • Removes duplicates

To install ProxyBroker, simply:

$ pip install proxybroker
import asyncio
from proxybroker import Broker

loop = asyncio.get_event_loop()

proxies = asyncio.Queue(loop=loop)
broker = Broker(proxies, loop=loop)

loop.run_until_complete(broker.find())

while True:
    proxy = proxies.get_nowait()
    if proxy is None: break
    print('Found proxy: %s' % proxy)

In result you get a proxy objects:

Found proxy: <Proxy AU 0.72s [HTTP: Transparent] 1.1.1.1:80>
Found proxy: <Proxy FR 0.33s [HTTP: High, HTTPS] 2.2.2.2:3128>
Found proxy: <Proxy US 1.11s [HTTP: Anonymous, HTTPS] 8.8.8.8:8000>
Found proxy: <Proxy -- 0.45s [SOCKS4, SOCKS5] 192.168.1.2:1080>
...
import asyncio
from proxybroker import Broker

async def use_example(pQueue):
    while True:
        proxy = await pQueue.get()
        if proxy is None:
            break
        print('Received: %s' % proxy)

async def find_advanced_example(pQueue, loop):
    broker = Broker(queue=pQueue,
                    timeout=8,
                    attempts_conn=3,
                    max_concurrent_conn=200,
                    judges=['https://httpheader.net/', 'http://httpheader.net/'],
                    providers=['http://www.proxylists.net/', 'http://fineproxy.org/eng/'],
                    verify_ssl=False,
                    loop=loop)

    # only anonymous & high levels of anonymity for http protocol and high for others:
    types = [('HTTP', ('Anonymous', 'High')), 'HTTPS', 'SOCKS4', 'SOCKS5']
    countries = ['US', 'GB', 'DE']
    limit = 10

    await broker.find(types=types, countries=countries, limit=limit)

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    pQueue = asyncio.Queue(loop=loop)
    # Start searching and checking.
    # At the same time, using the received proxies to another part of the program
    tasks = asyncio.gather(find_advanced_example(pQueue, loop), use_example(pQueue))
    loop.run_until_complete(tasks)
import asyncio
from proxybroker import Broker

loop = asyncio.get_event_loop()

proxies = asyncio.Queue(loop=loop)
broker = Broker(proxies, loop=loop)

data = '''10.0.0.1:80
          OK 10.0.0.2:   80 HTTP 200 OK 1.214
          10.0.0.3;80;SOCKS5 check date 21-01-02
          >>>10.0.0.4@80 HTTP HTTPS status OK
          ...'''

# Note: At the moment, information about the type of proxies in the raw data is ignored
loop.run_until_complete(broker.find(data=data))

found_proxies = [proxies.get_nowait() for _ in range(proxies.qsize())]
# ...
await broker.grab(countries=['US'], limit=100)
# ...
Property Type Example Description
host str '8.8.8.8' The IP address of the proxy
port int 80 The port of the proxy
types dict {'HTTP': 'Anonymous', 'HTTPS': None} The dict of supported protocols and their levels of anonymity
geo dict {'code': 'US', 'name': 'United States'} The dict of ISO code and the full name of the country proxy location
avgRespTime str '1.11' The string with the average response time of proxy
Parameter Required Type Default Description
queue Yes str   Queue to which will be added proxies.
timeout No int 8 Timeout is set to all the actions carried by the network. In seconds.
attempts_conn No int 3 Limiting the maximum number of connection attempts.
max_concurrent_conn No int or asyncio.Semaphore() 200 Limiting the maximum number of concurrent connections (as a number, or have used in your program semaphore).
providers No list of strings list of ~50 sites The list of sites that distribute proxy lists (proxy providers).
judges No list of strings list of ~10 sites The list of sites that show http-headers (proxy judges).
verify_ssl No bool False Check ssl certifications.
loop No asyncio event loop None Event loop
Method Optional parameters Description
Parameter Description
find data As a source of proxies can be specified raw data. In this case, search on the sites with a proxy does not happen. By default is empy. Searching and checking proxies with requested parameters.
types The list of types (protocols) which must be checked. Use a tuple if you want to specify the levels of anonymity: (Type, AnonLvl). By default, checks are enabled for all types at all levels of anonymity.
countries List of ISO country codes, which must be located proxies.
limit Limit the search to a definite number of working proxies.
grab countries List of ISO country codes, which must be located proxies. Only searching the proxies without checking their working.
limit Limit the search to a definite number of working proxies.
show_stats full If is False (by default) - will show a short version of stats (without proxieslog), if is True - show full version of stats (with proxies log). Limiting the maximum number of connection attempts.
  • Check the ping, response time and speed of data transfer
  • Check on work with the Cookies/Referrer/POST
  • Check site access (Google, Twitter, etc)
  • Check proxy on spam. Search proxy ip in spam databases (DNSBL)
  • Information about uptime
  • Checksum of data returned
  • Support for proxy authentication
  • Finding outgoing IP for cascading proxy
  • The ability to send mail. Check on open 25 port (SMTP)
  • The ability to specify the address of the proxy without port (try to connect on defaulted ports)
  • The ability to save working proxies to a file (text/json/xml)

Licensed under the Apache License, Version 2.0

This product includes GeoLite2 data created by MaxMind, available from http://www.maxmind.com.