Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”
This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.
preface
The reason why I have such an idea is that I prefer to read local PDFS and have a tendency to collect things. The core purpose is to list the content we need on the web page, remove irrelevant information, and then CTRL + P call the system to print and generate PDF files. So, here we go
IDEA 1
We can do whatever we want on our own site, but if we try to import someone else’s site into our own site, we get a common exception: cross-domain warnings!
However, nginx can make a reverse proxy perfect for cross-domain. Theory works, practice begins
Go to the nginx/confi/nginx.conf file and start editing it under the HTTP node
Add a domain name alias
upstream mysql {
server www.runoob.com:80;
}
Copy the code
Set the reverse proxy parameters
server {
listen 127.0. 01.: 8001;
location / {
proxy_pass http://mysql;
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Fonwarded-For $proxy_add_x_forwarded_for;
if ($request_method = 'OPTIONS') {
return 204;}}}Copy the code
Start the nginx service by using the start nginx command
The first reaction is, like, I gank? ! By asking the almighty Baidu, get the answer
Knock on the blackboard!! If we use HTTP, there will be a situation where others can use the reverse generation of Nginx to directly mirror our site, while at the same time making arbitrary changes to the content!
Typical scenario is, the actual site is A.com, we build A B.com site, reverse proxy to A.com, at the same time insert advertising or even theft Trojan, users accidentally access to B.com, thinking that it is the official SITE A, but in fact is to visit our magic changed after the mirror site, the risk is self-evident
So using nginx to make anti-generation scheme is a fold ~? Wait, ask Baidu nginx to reverse HTTPS feasibility
There are a lot of solutions, leave it to the follow-up to toss it over (too lazy to toss over, who let me is to solve the problem to answer it)
IDEA 2
We can do this by writing blocks of code on the oilmonkey
- Pull it to the menu on the left
url
recorded - empty
body
All elements in - Traversing and requesting records
url
, will gethtml
Element, in turn embedded tobody
node - Check the retrieved elements, resize the window, and note the elements that need to be removed
All right, we’re clear. Let’s go
// ==UserScript==
// @name Exports the document of the rookie tutorial
// @version 1
// @grant none
// @match https://www.runoob.com/mysql/**
// ==/UserScript==
// Copy the file to the console manually. After the file is loaded, print the file directly with CTRL + P
let hrefs = document.getElementById('leftcolumn').children;
Copy the code
As for why I only wrote a section, because my native JS will not…
IDEA 3
Since the solution has been confirmed to write the plug-in, why not run the script directly from the console…
Note: Google’s Browser, which supports adding stored code snippets, will save the code files for us and we can go directly to run!
The code of the finished product is shown below
{
let hrefs = [];
$('#leftcolumn>a').each(function() {
hrefs.push($(this).attr('href'The $()})'body').empty();
$('body').append('<div id="down-pdf-page"></div>');
let $pages = $('#down-pdf-page');
// hrefs = [hrefs[0], hrefs[1]]
hrefs.forEach((h, i) = > {
$pages.append('<div></div>');
try{$('#down-pdf-page>div:last').load(h, function() {
let $article = $('.article-body');
$('#down-pdf-page>div:last').empty().append($article);
console.log('Progress:', (i + 1) / hrefs.length)
})
} catch (e) {
}
})
setInterval(() = >{$('#postcomments').remove();
$('.feedback-btn').remove();
$('.previous-next-links').remove();
$('#respond').remove();
$('#comments').remove();
}, 2000)}Copy the code
This code is for novice tutorial preparation, we casually open the inside of the technical documents, such as this mysql, and then open the console, code a paste, wait for all the document pages are loaded after completion, direct CTRL + P save to PDF
When you write a page, you can use a timed function to clean up the elements that need to be removed (after all, fixing bugs is difficult, right?)
conclusion
Now we have basically achieved the goal, save the online document to local PDF, but there is still a small problem, the document does not have a bookmark, it is a fly in the ointment. Through the study of the document, found that the title is complete, but the PDF everywhere does not generate bookmarks according to the title, it is very uncomfortable, MD documents are automatically generated according to the title
In view of this, I think one idea is to specially generate a page of directory, and then link to the corresponding title, the specific technical details is the anchor point positioning of a tag, this needs to be studied later, theoretically feasible
There is also another idea, direct PDF to MD file, online are more recommended this online tool PDF to Markdown, I tried, found a big disadvantage: lost pictures
Yes, I remember a thing, backhand with a custom plug-in to the nuggets of the article cache to the local (collect strange Gospel), this article is to teach everyone how to cache the nuggets of the article
Welcome everyone to leave a message to communicate ~
2021.10.09 update
If you print a PDF with CTRL + P, you will not have any bookmarks.
If the Windows clipboard works, we simply CTRL + A to select the entire content and paste it into Typora (an MD editor), which will carry the content and images perfectly, while Typora will generate hierarchical menus based on titles
When we create a PDF file in Typora, it will carry a bookmark, perfect ~