Let me write the directory title here

  • The agent
    • How the proxy server works
    • Classification of the agent
      • Forward agent
      • The reverse proxy
      • Transparent proxy
      • The resources
  • Squid
    • concept
    • The installation
    • Configuration instructions
      • Configure authentication
      • The configuration file
      • Configure the keyword meaning
    • Access control
      • Initialize the
  • The problem
    • TCP_MISS/503
  • The resources
    • The agent pool
    • Configuration file update program
    • Squid Official Manual
    • The reference sample

The agent

How the proxy server works

How the proxy server works:

1. Client A sends A request to the proxy server to access the Internet. 2. After receiving the request, the proxy server matches the access rule in the ACL. If the rule is met, the proxy server searches the cache for the required resource information. 3. If client A’s request information exists in the cache, the request information is returned to client A. If there is no proxy server, the client will request the specified information from the Internet. 4. A host on the Internet sends the requested information to a proxy server, which stores the information in a cache. 5. The proxy server sends the return information of the host on the Internet to client A. 6. Client B also requests the same information. 7. The proxy server also accepts the request and matches the rules in the access control list. 8. If the rule is met, the proxy server will pass the cached information directly to client B.

Classification of the agent

  • Forward proxy (Controlling Intranet access to the Internet)
  • Reverse proxy (Controlling Internet access to Intranet)
  • Transparent proxy (unencrypted forward proxy)

Forward agent

Proxy Internal host Internet access, shared Internet access, cache, and Internet access control of Intranet users (the IP address and proxy port of the proxy server must be set on the client)

Forward proxy analysis diagram: Outer net | | modem router (DHCP, snat Shared on the Internet, the Internet behavior control, the speed limit, etc.) | | squids are agent (Shared on the Internet, a static page caching acceleration, Intranet users 47 and behavior control layer to get to the Internet, The speed limit, etc.) | | | -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - | Internet users online user 2Copy the code
Public | | br0 squid 172.16.13.250 server virbr1 192.168.100.1 | | | Intranet users VM1 eth0 192.168.100.128 (virbr1)Copy the code

The reverse proxy

Accessing the internal server from the external network, as opposed to the forward direction, is mainly used for cache acceleration or CDN of the website architecture

Client | | reverse proxy, cache acceleration, the segmentation, load balance, keep the session, etc.) | | webCopy the code

Transparent proxy

The client does not need to set the IP address and port of the proxy server, and is transparent to users.

The resources

www.cnblogs.com/yanjieli/p/…

Squid

concept

Squid is a cache proxy server software, widely used in the load balancing architecture of websites, common cache servers are Varnish, ATS and so on.

Forward proxy server can meet the requirement that only one server on the Intranet can access the Internet, but all the machines on the Intranet can access the Internet, and can also be used for crawler proxy access. In practice, Squid is used as crawler proxy server to realize the function of multi-IP switching.

The installation

yum install -y squid
Copy the code

Configuration instructions

Configure authentication

Yum install HTTPD # then run the following command to generate the user name and password: hello # After executing the command, enter the password htpasswd -c /etc/squid/passwd hello as promptedCopy the code

The configuration file

(squid/etd / / squids. Conf.)

Acl all SRC 0.0.0.0/0.0.0.0 # Allow all IP addresses to access ACL manager proto HTTP #manager URL protocol is HTTP acl localhost SRC 127.0.0.1/255.255.255.255 # done afternoon native IP acl to_localhost DST 127.0.0.1 # done lunch destination address for the machine IP acl CONNECT method CONNECT # request method to CONNECT #http_reply_access allow all # allow all clients to use this proxy acl Safe_ports port 80 # Allow 80 ACL for security updates Acl localnet SRC 10.195.249.225 # acl localnet SRC 10.195.236.141 allow localnet # http_access deny ! Safe_ports # acl OverConnLimit maxCONN 16 # Prevent attacks http_access deny OverConnLimit ICP_access deny all # Prohibit sending and receiving ICP requests from the neighbor server buffer Ident_lookup_access deny all # disable lookup check DNS http_port 8080 transparent # specifies a Squid port number to listen to browser client requests. hierarchy_stoplist cgi-bin ? # used to force certain objects not to be cached, mainly for security purposes. acl QUERY urlpath_regex cgi-bin \? Cache deny QUERY cache_mem 1 GB # This is an optimization option, increasing the memory value to facilitate caching. It should be noted that: \# In general if the system has memory, set this value to (n/)3M. Fqdncache_size 1024 #FQDN Cache size maximum_object_size_in_memory 2 MB # Memory_replacement_policy allows the largest file to be loaded into memory Cache_replacement_policy Heap LFUDA # dynamically uses the minimum heap, Cache_dir ufs /home/cache 5000 32 512 # cache directory the maximum cache value used by the ufs type is 1000MB, \#32 level 1 directories, Max_open_disk_fds 0 # maximum number of open files allowed,0 unlimited minimum_object_size 1 KB # minimum_object_size 20 MB Cache_swap_low 90 # Minimum allowed swap 90% cache_SWap_high 95 # Maximum allowed swap 95% ipcache_size 2048 # IP address cache size 2M Ipcache_low 90 # 90% minimum allowed to use a swap ipcache ipcache_high 95 # maximum allowable ipcache use swap 90% access_log/var/log/squid/access. Squid log Cache_log/var/log/squid/cache. log squid cache_store_log none Squid will create access records that mimic the Web server format. If you want to use the #Web access record analyzer, you need to set this parameter. 0 20% 4320 override-expire override-lastmod reload-into-ims ignore-reload # update the cache rule acl buggy_server url_regex ^http://.... Broken_posts allow buggY_server acl apache rep_header Server ^ apache # Broken_vary_encoding allow apache request_entities off # Prevent attacks header_access header allow all # relaxed_header_parser on # do not strictly analyze HTTP headers. Client_lifetime 120 minute Cache_mgr [email protected] # Specifies the address to send alarm messages to the buffer manager when a buffer problem occurs. Cache_effective_user SQUID # Squid server cache_effective_group SQUID ICp_port 0 Squid specifies the port number to send and receive ICP requests from the neighbor server buffer. This is set to 0 because Squid is configured as the internal Web server accelerator, so the neighbor server buffer is not required. 0 is to disable the # cache_peer setting to allow the host to update the cache, 127.0.0.1 cache_peer 127.0.0.1 parent 80 0 no-query default multicast-responder no-netdb-exchange 127.0.0.1 cache_peer 127.0.0.1 parent 80 0 no-query default multicast-responder no-netdb-exchange Cache_peer_domain 127.0.0.1 hostname_aliases 127.0.0.1 error_directory/usr/share/squid/errors/Simplify_Chinese # define the wrong path Parameter Description Value always_direct Allows all requests to be forwarded directly to the original server. Max_filedesc 2048 # Maximum open file description half_closed_clients off coreDump_dir /var/log/squid Make Squid immediately close the client connection when read no longer returns data. Sometimes read no longer returns data because some clients turn off TCP sending data and still keep receiving data. Squid cannot tell the difference between TCP half-closed and completely closed.Copy the code

Squid in the crawler proxy, we only need to do a squid proxy, and then do forwarding polling to other agents, how to use squid proxy and

Automatic forwarding polling?

Add this line:

cache_peer 120.xx.xx.32 parent 80 0 no-query weighted-round-robin weight=2 connect-fail-limit=2 allow-miss max-conn=5 name=proxy-90
Copy the code

Cache_peer 120.xx.xx.32 specified twice Cache_peer 120.xx.xx.32 specified twice cache_peer 120.xx.xx.32 specified twice

Configure the keyword meaning

Cache_peer Web server ADDRESS Server type HTTP port ICP port [Optional] the options are as follows:

  • Proxy-only: indicates that the data obtained from the peer is not cached locally. Squid is cached by default.
  • Weight =n: used for the case that you have multiple peers. In this case, if more than one peer has the data you requested, SQUID calculates the ICP response time of each peer to determine its weight value, and then SQUID sends ICP request to the peer with the largest weight. That is, the larger the weight value, the higher the priority. Of course, you can also specify the weight value manually;
  • No-query: no ICP request is sent to the peer. If the peer is unavailable, you can use this option.
  • Default: Similar to the Default route in the routing table, the peer is used as a last-ditch attempt. When you have only one parent proxy server and it does not support ICP, use the default and no-query options to send all requests to that parent proxy server.
  • Login =user:password: Use this option when your parent proxy server requires user authentication. After the update is complete, save and restart the SQUID, and you will find that the SQUID is already available.

Access control

Squid access control list (ACL) acl denyip SRC 192.168.100.128/32 -- Denies access to Intranet 192.168.100.128/32 Http_access deny denyip acl denyip SRC 192.168.100.128 192.168.100.132/255.255.255.255 http_access deny denyip acl VIP Arp 00:0C:29:79:0C:1A http_access allow VIP acl baddSturl2 DST 220.11.22.33 -- Cannot access the website with this external IP address http_access deny baddsturl2 Acl baddstURL dstDomain -i www.163.com -- can't access www.163.com and WWW.163.COM; The -i parameter defines case matching. But you can access war.163.com or sports.163.com http_access deny baddstURL acl baddstURL dstdom_regex -i 163 -- this is to ban all domain names up to 163, Http_access deny baddstURL acl baddstURL dstdom_regex "/etc/squid/baddsturl" -- if there are too many urls, it can be written as a file, Then put a line in the file for a website that you want to ban http_access deny baddstURL acl baddSturl3 url_regex -i baidu -- baidu deny baddsturl3 acl badfile urlpath_regex -i \.mp3$ \.rmvb$ \.exe$ \.zip$ \.mp4$ \.avi$ \.rar$ http_access deny badfile - prohibit downloads files with definition of suffix acl badipclient2 SRC 192.168.100.0/255.255.255.0 acl worktime time MTWHF 9:00-17:00 http_access deny Badipclient2 workTime -- Deny access to the Internet during working hours of the network segment 192.168.100.0 ACL badipClient3 SRC 192.168.100.128 ACL conn5 maxconn 5 http_access Deny badipClient3 conn5 -- The maximum number of connections is 5Copy the code

www.cnblogs.com/wangxiaoqia…

Initialize the

After modifying the configuration file, save it and run the following command to initialize squid squid-zCopy the code

The problem

TCP_MISS/503

The log contains the following information

1587003941.248 0 172.25.0.1 TCP_MISS / 4362 GET 503 http://gtj.hangzhou.gov.cn/col/col1363087/index.html - HIER_NONE / - Text/HTML 1587003942.505 0 172.25.0.1 TCP_MISS / 503-4362 GET http://gtj.hangzhou.gov.cn/col/col1363087/index.html HIER_NONE/ -text/HTML 1587003943.779 301 172.25.0.1 TCP_MISS/200 388 GET http://httpbin.org/ip - HIER_DIRECT/34.230.193.231 Application/JSON 1587003943.899 0 172.25.0.1 TCP_MISS/503 4357 GET http://gtj.hangzhou.gov.cn/col/col1363087/index.html - HIER_NONE / - text/HTML 1587003945.333 0 172.25.0.1 TCP_MISS / 503 4362 GET http://gtj.hangzhou.gov.cn/col/col1363087/index.html - HIER_NONE/- text/htmlCopy the code

The keyword TCP_MISS/503 is displayed

Google it, found this article: forums.freebsd.org/threads/341…

Solution:

Squid/squid/conf: dns_v4_first on /etc/squid/squid.conf: dns_v4_first on

It’s time to try again!

If that doesn’t work, change the system configuration

Modify /etc/sysconfig/network: set NETWORKING_IPV6 to no

(Reboot is best)

The resources

Cn.linux.vbird.org/linux_serve…

The agent pool

Github.com/AaronJny/op…

Configuration file update program

Github.com/xNathan/squ…

Documentation of the above projects

Xnathan.com/2017/03/01/…

Xnathan.com/2017/02/28/…

Xnathan.com/2017/03/02/…

Squid Official Manual

Squid zyan. Cc/book / /…

The reference sample

rookiefly.cn/detail/192