Python is a super fast public intelligence crawler

Photon is an open source intelligence crawler from S0MD3V. Its main features include:

1. Crawl links (inner chain, outer chain). 2. Climb the link with parameters, for example, pythondict.com/test?id=2. 3. Files (PDF, PNG, XML). 4. The key (accidentally released in the front-end code). Js files and Endpoint(the important monitor in Spring) 6. String that matches a custom regular expression. 7. Subdomain name and DNS data.

You can do a lot of things with it, like crawl images, find bugs, find subdomains, crawl data, and so on. And the extracted data is neatly formatted:

Not only that, it even supports JSON format, which requires only a JSON parameter when entering a command:

python photon.py -u "http://example.com" --export=json

Why would it be used for intelligence gathering? Keep looking behind you.

1. Download and install

You can download the full project from Photon’s Github: github.com/s0md3v/Phot…

Or follow the Python utility public account below to reply photon in the background to get a domestic web disk download address. Download it and unzip it to where you want to use it. If you have not already installed Python, you are advised to read this article: An Ultra-detailed Python Installation Guide for installing Python.

After installing Python, open CMD(Windows)/Terminal(macOS), go to the folder you just unzipped and type the following command to install Photon’s dependencies:

pip install -r requirements.txt

As shown in the figure:

2. Easy to use

Note that you need to use it in the Photon folder. For example, let’s try extracting the URL of a random website and enter the following command in the terminal:

python photon.py -u bk.tencent.com/

The results are as follows:

It will create a folder in the current directory for the domain you are testing, such as bk.tencent.com in my case:

Hee hee, let’s have a look inside what things, there is no programmer left small eggs, open external. TXT, this is the site’s external chain storage location. As you can see, not only the website pages, but also the CDN file addresses are stored here, so external can be a treasure trove.

You can also find all the open source projects linked to on the site:

3. The extension

The value of this project is not only in being able to pull the data you want quickly, but also in being able to build a great intelligence system (if you are skilled). Because it can go on and on, for example, from the outer chain, you can find a lot of information about the site:

The information actually fits the intelligence requirements better than the results of a search engine search. Since not all information can be found in search engines, with Photon you can follow the trail to find those hidden on the Internet. Imagine if you collected a lot of these sites… Wouldn’t it be great to build your own search engine with regular expressions?

So that’s the end of our article, if you enjoyed our Python tutorial today, please keep checking us out, and give us a thumbs up/check out below if it helped you

Python Dict.com Is more than a dictatorial model

Python is a super-fast public intelligence crawler

Python is a super fast public intelligence crawler

1. Download and install

2. Easy to use

3. The extension

Related Posts

SpringBoot configures the MongoDB multi-data source

Design pattern — Factory method pattern

Graph application – minimum spanning tree