background
Recently, when I was beautifying the GitHub homepage, I made some interesting things for those who are interested
Here I stick a home page address: github.com/JS-banana, interested can see ~
When I was editing my personal information, I had an idea: could I synchronize my blog status updates on my GitHub homepage?
When I update my blog, my GitHub homepage automatically syncs the latest updates to my blog
It was an idea that came up at the time, and I looked into it later. Originally, I wanted to use NodeJS to write a crawler. There are no problems with that, but there are a lot of defects. I can only build a semi-finished crawler myself, and it does not have certain reuse, so I ruled it out
Then I looked at Python’s FeedParser library and thought it was a good fit. (FeedParser is the most common RSS library in Python, making it easy to get headlines, links, and article entries from any RSS or Atom feed.)
I also looked at the effect and felt very good, so we only need to do two things:
- implementationThe Atom feeds(for
feedparser
Library use) - implementation
README.md
Dynamic update of files (update home page after receiving subscription information)
RSS, Atom feeds
RSS feeds should be familiar to most of us. When we look at a lot of big blogs and popular websites and services, we find that they all offer RSS/Atom feeds. So what is RSS? What is Atom?
What is RSS?
- Really Simple Syndication
- Gives you the ability to syndicate the content of your web site
- Defines very simple methods to share and view titles and content
- Files can be updated automatically
- Allows views to be personalized for different web sites
- use
XML
write
Why use RSS?
RSS is designed to display selected data.
Without RSS, users would have to come to your site every day to check for new content. This is too time consuming for many users. With RSS feeds (often referred to as News feeds or RSS feeds), users can use RSS aggregators (sites or software that aggregate and categorize RSS feeds) to check your website updates more quickly.
The future of RSS (Birth of Atom)
The future of the protocol is uncertain because of copyright issues with RSS 2.0
With RSS’s future uncertain and the development of the RSS standard having many problems or shortcomings, ATOM can be understood as a simple alternative to RSS.
What is the FEED
A FEED is essentially a “middleman” between RSS (or ATOM) and subscribers, helping to deliver information wholesale. So, the common formats for feeds are RSS and ATOM, and FEED subscriptions are still better known on the Web as RSS or ATOM subscriptions.
What is a subscription
Subscribe to similar to ordinary people subscribe to newspapers and magazines, but almost all site RSS/ATOM subscriptions are free, there are also some how much do you charge for subscription to “non-mainstream” gens, FEED, of course, just on the network information transmission, generally do not involve physical data transfer, so you met like website, and also like to use the online or offline reading, can subscribe, And you can unsubscribe at any time.
conclusion
RSS and Atom have a similar XML-based format. The basic structure is the same, with a slight difference in the expression of the nodes. All we need to know is that ATOM is an improvement over RSS2.0.
Generate Atom subscriptions for your site
Atom subscription base structure
To understand the basic format and syntax of atom.xml, watch a simple demo
<! -- Header information -->
<! - main body - - >
<feed xmlns="http://www.w3.org/2005/Atom">
<! -- Basic information -->
<title>Small handsome technology blog</title>
<link href="https://ssscode.com/atom.xml" rel="self"/>
<link href="https://ssscode.com/"/>
<updated>The 2021-08-28 16:25:56</updated>
<id>https://ssscode.com/</id>
<author>
<name>JS-banana</name>
<email>[email protected]</email>
</author>
<! -- Content area -->
<entry>
<title>Webpack + React + TypeScript builds a standardized application</title>
<link href="https://ssscode.com/pages/c3ea73/" />
<id>https://ssscode.com/pages/c3ea73/</id>
<published>The 2021-08-28 16:25:56</published>
<update>The 2021-08-28 16:25:56</update>
<content type="html"></content>
<summary type="html"></summary>
<category term="webpack" scheme="https://ssscode.com/categories/?category=JavaScript"/>
</entry>
<entry>.</entry>.</feed>
Copy the code
The basic information piece can be customized, and then, after going to the end, we can find that we only care about
…
tag content, that is, the basic information of each blog post ~
Therefore, we can generate atom.xml ourselves as long as we follow the specification, format, and syntax, nice😎~
If you don’t want to write your own, try this feed
Write atom.xml file generating functions
Since my blog is built on Vuepress (webpack + vue2.x), I’ll use NodeJS as an example
Read all the markdwon files without going into details, we get all the list data, do a simple processing, here just fill in some data we need, if you want to read the feed, you can also enrich the information content ~
const DATA_FORMAT = 'YYYY-MM-DD HH:mm:ss';
// posts is all the blog post information
// The ampersand in XML needs to be replaced with & Otherwise there will be syntax errors
function toXml(posts) {
const feed = ` <? The XML version = "1.0" encoding = "utf-8"? > < feed XMLNS = "http://www.w3.org/2005/Atom" > < title > small handsome の technology blog < / title > < link href = "https://ssscode.com/atom.xml" rel="self"/> <link href="https://ssscode.com/"/> <updated>${dayjs().format(DATA_FORMAT)}</updated>
<id>https://ssscode.com/</id>
<author>
<name>JS-banana</name>
<email>[email protected]</email>
</author>
${posts
.map(item => {
return `
<entry>
<title>${item.title.replace(/(&)/g.'& ')}</title>
<link href="https://ssscode.com${item.permalink}" />
<id>https://ssscode.com${item.permalink}</id>
<published>${item.date.slice(0.10)}</published>
<update>${item.date}</update>
</entry>`;
})
.join('\n')}
</feed>`;
fs.writeFile(path.resolve(process.cwd(), './atom.xml'), feed, function(err) {
if (err) return console.log(err);
console.log('File write succeeded! ');
});
}
Copy the code
Node executes this file, which should generate an atom.xml file in its sibling directory, as you can see
Ok, atom subscription source done ~
Simple usage of FeedParser
Python FeedParser – There is a Node version of Python FeedParser on the web
Copy the demo snippet to atom.xml and test the usage briefly. Take a look at the return value format. In order to see the structure more clearly, I have processed the result of the Python execution
The atom XML source file
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Small handsome technology blog</title>
<link href="https://ssscode.com/atom.xml" rel="self"/>
<link href="https://ssscode.com/"/>
<updated>The 2021-08-28 16:25:56</updated>
<id>https://ssscode.com/</id>
<author>
<name>JS-banana</name>
<email>[email protected]</email>
</author>
<entry>
<title>Webpack + React + TypeScript builds a standardized application</title>
<link href="https://ssscode.com/pages/c3ea73/" />
<id>https://ssscode.com/pages/c3ea73/</id>
<published>The 2021-08-28 16:25:56</published>
<update>The 2021-08-28 16:25:56</update>
</entry>
</feed>
Copy the code
The main py script
import feedparser
blog_feed_url = "./atom.xml"
feeds = feedparser.parse(blog_feed_url)
print (feeds)
Copy the code
The general structure of the output is as follows
{
bozo: 1.// entries
entries: [{title: "Webpack + React + TypeScript builds a standardized application".title_detail: {
type: "text/dplain".language: None,
base: "".value: "Webpack + React + TypeScript builds a standardized application",},links: [{ href: "https://ssscode.com/pages/c3ea73/".rel: "alternate".type: "text/html"}].link: "https://ssscode.com/pages2/c3ea73/".id: "https://ssscode.com/pages/c3ea73/".guidislink: False,
published: "The 2021-08-28 16:25:56".publoished_parsed: time.struct_time(), // This is a date-handling function
update: "The 2021-08-28 16:25:56"],},// feed
feed: {
title: "Xiaosai Technology Blog".title_detail: { type: "text/plain".language: None, base: "".value: "Xiaosai Technology Blog" },
links: [{href: "https://ssscode.com/atom.xml".rel: "self".type: "application/atom+xml" },
{ href: "https://ssscode.com/".rel: "alternate".type: "text/html"},].link: "https://ssscode.com/".updated: "The 2021-08-28 16:25:56".updated_parsed: time.struct_time(),
id: "https://ssscode.com/".guidislink: False,
authors: [{ name: "JS-banana".email: "[email protected]"}].author_detail: { name: "JS-banana".email: "[email protected]" },
author: "JS-banana ([email protected])",},headers: {},
encoding: "utf-8".version: "atom10".bozo_exception: SAXParseException("XML or text declaration not at start of entity"),
namespaces: { "": "http://www.w3.org/2005/Atom"}},Copy the code
As you can see, just get all the entries and write a function to get the content we need
def fetch_blog_entries() :
entries = feedparser.parse(blog_feed_url)["entries"]
return[{"title": entry["title"]."url": entry["link"].split("#") [0]."published": entry["published"].split("T") [0],}for entry in entries
]
Copy the code
Replaces the markdown file with the specified area content
The last step is: how to replace the area specified in our readme. md home file, and then push to GitHub to complete the update
### Hello, I'm xiao Shuai! 👋. . Other information <! -- start --> This displays the blog information <! -- end -->Copy the code
As mentioned above, no changes are required except for the designated areas that need to be updated
At this point, you can use Python to read the comment and use the re to process the replacement
We mark annotations in readme.md
<! -- blog starts --> ... <! -- blog ends -->Copy the code
Code:
def replace_chunk(content, marker, chunk, inline=False) :
r = re.compile(
r"
.*
".format(marker, marker),
re.DOTALL,
)
if not inline:
chunk = "\n{}\n".format(chunk)
chunk = "<! -- {} starts -->{}<! -- {} ends -->".format(marker, chunk, marker)
return r.sub(chunk, content)
Copy the code
Finally, combined with interface request, file reading, the complete code is as follows
import feedparser
import json
import pathlib
import re
import os
import datetime
blog_feed_url = "https://ssscode.com/atom.xml"
root = pathlib.Path(__file__).parent.resolve()
def replace_chunk(content, marker, chunk, inline=False) :
r = re.compile(
r"
.*
".format(marker, marker),
re.DOTALL,
)
if not inline:
chunk = "\n{}\n".format(chunk)
chunk = "<! -- {} starts -->{}<! -- {} ends -->".format(marker, chunk, marker)
return r.sub(chunk, content)
def fetch_blog_entries() :
entries = feedparser.parse(blog_feed_url)["entries"]
return[{"title": entry["title"]."url": entry["link"].split("#") [0]."published": entry["published"].split("T") [0],}for entry in entries
]
if __name__ == "__main__":
readme = root / "README.md"
readme_contents = readme.open(encoding='UTF-8').read()
entries = fetch_blog_entries()[:5]
entries_md = "\n".join(
["* <a href='{url}' target='_blank'>{title}</a> - {published}".format(**entry) for entry in entries]
)
rewritten = replace_chunk(readme_contents, "blog", entries_md)
readme.open("w", encoding='UTF-8').write(rewritten)
Copy the code
I’m not familiar with Python either, but I can follow in the footsteps of others and use it to achieve the desired effect
Recently, I touched some Python related script library, and found it quite interesting. I think it is necessary to learn it, and it is very helpful in daily use. After all, Python is very popular now, even as a tool, it feels very powerful
Example Configure GitHub Action scheduled tasks
The script to implement the functionality is done, and now we want it to execute automatically after we finish updating the blog
Here we use GitHub Action’s scheduled task directly
Add the file.github/workflows/ci.yml to the project
name: Build README
on:
workflow_dispatch:
schedule:
- cron: "30 0 * * *" Run at 0:30 every day, Beijing time needs + 8
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Check out repo Get the code branch
uses: actions/checkout@v2
- name: Set up Python # python environment
uses: actions/setup-python@v2
with:
python-version: 3.8
- uses: actions/cache@v2 # dependency cache
name: Configure pip caching
with:
path: ~/.cache/pip
key: The ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
restore-keys: | ${{ runner.os }}-pip-
- name: Install Python dependencies # install dependencies
run: | python -m pip install -r requirements.txt
- name: Update README # execute script
run: |- python build_readme.py cat README.md
- name: Commit and push if changed # Git commit
run: |- git diff git config --global user.email "[email protected]" git config --global user.name "JS-banana" git pull git add -A git commit -m "Updated README content" || exit 0 git pushCopy the code
Done ~
Take a look at the effect:
The script will run once a day to synchronize information about the blog
conclusion
I only knew about RSS feeds before, I didn’t know about all these details, but this time I sorted out some of them and tried to play by myself, which was pretty good
It feels great to know more than one language, sometimes it will give you a completely different way of thinking, and maybe a better solution
Help me up. I can still learn to laugh
reference
- Subscription base: RSS, ATOM, feeds, syndication, feeds, syndication, and subscriptions
- What’s the difference between RSS,ATOM, and FEED
- feedparser
- jasonkayzk