Rookie tutorial online document cache locally

Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”

This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

preface

The reason why I have such an idea is that I prefer to read local PDFS and have a tendency to collect things. The core purpose is to list the content we need on the web page, remove irrelevant information, and then CTRL + P call the system to print and generate PDF files. So, here we go

IDEA 1

We can do whatever we want on our own site, but if we try to import someone else’s site into our own site, we get a common exception: cross-domain warnings!

However, nginx can make a reverse proxy perfect for cross-domain. Theory works, practice begins

Go to the nginx/confi/nginx.conf file and start editing it under the HTTP node

Add a domain name alias

upstream mysql {
    server www.runoob.com:80;
}
Copy the code

Set the reverse proxy parameters

server {
    listen     127.0. 01.: 8001;
    location / {
        proxy_pass http://mysql;
        proxy_set_header Host $http_host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Fonwarded-For $proxy_add_x_forwarded_for;
        if ($request_method = 'OPTIONS') {
                return 204;}}}Copy the code

Start the nginx service by using the start nginx command

The first reaction is, like, I gank? ! By asking the almighty Baidu, get the answer

Knock on the blackboard!! If we use HTTP, there will be a situation where others can use the reverse generation of Nginx to directly mirror our site, while at the same time making arbitrary changes to the content!

Typical scenario is, the actual site is A.com, we build A B.com site, reverse proxy to A.com, at the same time insert advertising or even theft Trojan, users accidentally access to B.com, thinking that it is the official SITE A, but in fact is to visit our magic changed after the mirror site, the risk is self-evident

So using nginx to make anti-generation scheme is a fold ~? Wait, ask Baidu nginx to reverse HTTPS feasibility

There are a lot of solutions, leave it to the follow-up to toss it over (too lazy to toss over, who let me is to solve the problem to answer it)

IDEA 2

We can do this by writing blocks of code on the oilmonkey

Pull it to the menu on the lefturlrecorded
emptybodyAll elements in
Traversing and requesting recordsurl, will gethtmlElement, in turn embedded tobodynode
Check the retrieved elements, resize the window, and note the elements that need to be removed

All right, we’re clear. Let’s go

// ==UserScript==
// @name Exports the document of the rookie tutorial
// @version 1
// @grant none
// @match https://www.runoob.com/mysql/**
// ==/UserScript==
// Copy the file to the console manually. After the file is loaded, print the file directly with CTRL + P
let hrefs = document.getElementById('leftcolumn').children;
Copy the code

As for why I only wrote a section, because my native JS will not…

IDEA 3

Since the solution has been confirmed to write the plug-in, why not run the script directly from the console…

Note: Google’s Browser, which supports adding stored code snippets, will save the code files for us and we can go directly to run!

The code of the finished product is shown below

{
  let hrefs = [];
  $('#leftcolumn>a').each(function() {
    hrefs.push($(this).attr('href'The $()})'body').empty();
  $('body').append('<div id="down-pdf-page"></div>');
  let $pages = $('#down-pdf-page');
  // hrefs = [hrefs[0], hrefs[1]]
  hrefs.forEach((h, i) = > {
    $pages.append('<div></div>');
    try{$('#down-pdf-page>div:last').load(h, function() {
        let $article = $('.article-body');
        $('#down-pdf-page>div:last').empty().append($article);
        console.log('Progress:', (i + 1) / hrefs.length)
      })
    } catch (e) {

    }
  })
    setInterval(() = >{$('#postcomments').remove();
      $('.feedback-btn').remove();
      $('.previous-next-links').remove();
      $('#respond').remove();
      $('#comments').remove();
    }, 2000)}Copy the code

This code is for novice tutorial preparation, we casually open the inside of the technical documents, such as this mysql, and then open the console, code a paste, wait for all the document pages are loaded after completion, direct CTRL + P save to PDF

When you write a page, you can use a timed function to clean up the elements that need to be removed (after all, fixing bugs is difficult, right?)

conclusion

Now we have basically achieved the goal, save the online document to local PDF, but there is still a small problem, the document does not have a bookmark, it is a fly in the ointment. Through the study of the document, found that the title is complete, but the PDF everywhere does not generate bookmarks according to the title, it is very uncomfortable, MD documents are automatically generated according to the title

In view of this, I think one idea is to specially generate a page of directory, and then link to the corresponding title, the specific technical details is the anchor point positioning of a tag, this needs to be studied later, theoretically feasible

There is also another idea, direct PDF to MD file, online are more recommended this online tool PDF to Markdown, I tried, found a big disadvantage: lost pictures

Yes, I remember a thing, backhand with a custom plug-in to the nuggets of the article cache to the local (collect strange Gospel), this article is to teach everyone how to cache the nuggets of the article

Welcome everyone to leave a message to communicate ~

2021.10.09 update

If you print a PDF with CTRL + P, you will not have any bookmarks.

If the Windows clipboard works, we simply CTRL + A to select the entire content and paste it into Typora (an MD editor), which will carry the content and images perfectly, while Typora will generate hierarchical menus based on titles

When we create a PDF file in Typora, it will carry a bookmark, perfect ~

Rookie tutorial online document cache locally

preface

IDEA 1

IDEA 2

IDEA 3

conclusion

2021.10.09 update

Related Posts

Vue Study Notes – Personal Blog (2)

W3C Universal Internet of Things Analysis: Description of things

Interview: The way to front-end performance Optimization?