How does the Bay Area Daily Work?

There is expected to resume

What is the Bay Area Daily? The Bay Area Daily is a personal blog run by an engineer working in San Francisco. Every day, the blog selects 5 articles of high quality, writes a simple comment or two for each article with a Chinese title, and recommends them to readers through nearly 10 channels (such as website, iOS app, Weibo, wechat, Twitter, etc.).

The “ME” who wrote this article is the engineer. Bay Area Daily is not a company, nor is it a start-up project. It’s just one of my side projects. Since the first issue was issued on August 6, 2014, it has been in operation till now.

Seems simple enough, right? When I started, I thought it was very simple, very untechnical, and anyone with a high school diploma and simple computer knowledge could run this kind of blog. But why is it that with the same topic composition, everyone writes different content, some people can write off-topic and some people can write full marks? How is it that everyone can write, and that someone can become a writer and make a living by writing?

How is the daily content delivered? I find new articles in my spare time every day, and those I intuitively think are good articles are stored in Pocket. There are many ways to discover new articles, in no particular order: Hacker News, Medium, Quora, the RSS feeds of many blogs I subscribe to, articles I read on various social networks, articles I find on the spur of the moment when I’m interested in a topic, articles I’ve picked up from conversations with colleagues, etc.

I also read an article or two at odd times during the day. But they usually go home in the evening, after dinner, and then start reading with full attention. The number of articles read every day is not necessarily, sometimes a day to read more than 20, barely pick out 5; Sometimes I read 5 articles a day, but they were all very good, and I stopped reading others. This process usually takes between one and three hours.

I read articles on my iPad, then multitask and open Slack, where I talk to my wanqu-ops, kind of like a programmer typing a command line at work. I will tell the little robot that this is a link to an article, and then it will automatically extract the title of the article, THE SLug of the URL, the picture and other information, and finally insert it into the background database; Every 5 articles, a new issue is automatically generated, with the date of the day (Beijing time) and the issue number of the day (for example, issue 502). Anyway, ALL I have to do is throw the link to this little robot, and it can do the tedious work for me — what does a small editor of a normal website do? I can use CMS, and then manually input items one by one. This process doesn’t take much time, maybe more than ten minutes a day. But I’m too lazy to spend more than ten minutes a day.

The next step is to write a brief review. Wq Post 2672 title: This is a good article! Wq Post 2672 title: This is a good article! Similarly, you can input the content such as brief comments. Here’s me reading an article on my iPad and writing a brief comment using Slack:

After gathering up 5 articles, I finally give the little robot order: send out the content of this issue. It then automatically updates the content to the website, pushes the news to iOS app users, and automatically posts it to Twitter, Facebook and other social platforms.

In short, the manual part is mainly reading articles and writing brief comments. The rest of the tedious operations, are automatically completed by the program. The same topic composition, programmer writing is not the same as the ordinary website small make up; Different programmers write it differently.

I wrote some code for the Bay Area Daily’s side project. Since it is only a side project, I do not have high expectations for it, and strive to achieve the functions I want with the simplest method, the least amount of time and the most “fast and rough” method.

In August 2014, the Bay Area Daily was just in the “fun” stage, not serious. So the Bay Area Daily Website at that time was just static pages, generated by Pelican.

Later, the site was changed to WordPress. A couple of different themes. I even spent some money on a Product Hunt theme.

Then one weekend in March 2015, I didn’t know what went wrong, so I stayed home for a day and wrote the site from scratch. Built mainly with Python/Django, Celery, RabbitMQ, Postgres and Redis.

Why Django? Why can you build a website in a day? It’s very simple. In the past two or three years, I have written more than 10 Web apps, large and small, all of which are side projects for fun. They are all built with Django/Postgres, just a routine. Copy and paste the code you wrote before, and of course you can quickly put together a website. What’s more, the first version of the site was much simpler than the one you see today; The current version of the site is the result of more than a year of continuous improvement.

Almost all websites can be simplified to this architecture: Web App, Datastore, Async Worker, Task Queue, and Scheduler. Among them, Web App is to run the website code to receive the user’s access request, and all time-consuming tasks (such as sending emails, tweeting, data statistics, etc.) are thrown to the task Queue. Then Async Worker grabs tasks from the task Queue and processes them offline. While Scheduler is a program that runs on a scheduled schedule, many websites use Cron directly.

For the Bay Area Daily, the Web App is a Django App that runs N processes in the UWSgi container, is handling the process in the supervisorord container, and is designed to be loaded with an nginx. Database uses Postgres and Redis. Most of the data that needs to be stored permanently is in Postgres, while Redis stores the number of visits to articles and some data that only lasts a day or two. Task Queue is using RabbitMQ. Scheduler uses Celery Beat. Async Worker is Celery. Below is the simple backstage architecture of the Bay Area Daily newspaper:

All of the bay Area Daily’s back-office processes, large and small, run on three DigitalOcean virtual machines (which cost more than $50 a month). At the rate of traffic growth, such an architecture is expected to last until at least 2020 — you see millions and millions of daily visits in the tech press, forgetting that most of the world’s websites have very, very few visits. Of course, I also ran other side projects on these three virtual machines, which was not a waste of computing resources. I also have some simple small projects to put on Bluehost, sharing the host with others (without root permission), which is cheap (less than $5 per month).

In May 2015, I wrote an iOS App over a weekend. Written entirely in Swift, it’s a great experience. As mentioned above, this is only a side project, so my expectations are not high. I can shamelessly launch the app written over a weekend with poor quality, and it is not used by many people anyway, so I will gradually improve it in the future.

The backend Api is also Django App (abstracted out some common internal apis to share with the site), and also runs a UWSGi process and loads balancer with Nginx.

On the weekend of developing the first version of the App, in addition to completing basic functions (browsing articles), I also added:

Crash Report (using Crashlytics) notified me of the user’s APP Crash in the first time, which was convenient for me to debug. Google Anlytics monitors users’ use of app, such as how many people are online, which page is visited by more people, which button is clicked by more people, etc. Appirater, which reminds users who have used the App a few times to go to the App Store and give them a good review; PSUpdateApp, to remind users of the new version, quickly update. These are just routines. Even if you’re not an iOS developer, you should at least be aware of it. To launch a product, the most basic things need to be collected: metrics collection, crash information collection, update reminders, etc. — BEFORE I started to make the app, I asked my colleagues which tools should be used, and many detours were avoided. See this blog post: Launch an App in two days and four nights

Publishing system The Bay Area Daily has many channels for Posting. The modern blog is different from the blog of 10 years ago: the blog of 10 years ago was a website, but the blog of today has many forms, the same content can be presented to readers in different forms through different channels.

Earlier, I used Slack to talk to robots. The robot is the Hubot — it parses the commands I issue and then calls the REST API I provide in my Django App to publish articles. When the status of the article changes from Pending to Published, some tasks will be triggered. Each task is responsible for publishing tasks on a channel, such as one task to Weibo, one task to Reddit, etc.

Weibo: Call weibo API to publish. Posts with the words “from the BayArea daily newspaper” at the bottom were automatically posted by the publishing system.

Wechat: In theory, after becoming an authenticated user, you can call the API to automatically send messages to subscribers. But I didn’t become a certified user, so I had to post manually. Hubot generates the text that needs to be posted for each issue, and I just copy and paste it to send out the message. For wechat, I basically give up, not care at all; Because the mode of sharing links of Bay Area Daily is not suitable for wechat.

Twitter: Calls the Twitter API to publish. Twitter’s API is doing a good job.

Facebook: Publish using Facebook’s Graph API.

Reddit: posted by calling the Reddit API. Are there Chinese users on Reddit? I’m not sure. I just use Reddit as a tool for SEO.

Google+ : The API provided by Google was hard to use and I didn’t want to spend time studying it, so I used Buffer to automatically sync tweets.

Mail subscription: Automatically reads RSS feeds with MailChimp and automatically publishes them every day.

IOS App Push: Nothing to say, just APNS. Every 2 hours, there will be a Celery Beat job to clear out any device tokens that have expired (i.e. some user has disabled push Notification or has uninstalled bay Area Daily app); That’s why the number of people who turn on their notifications sometimes gets smaller. Periodically clearing invalid device tokens can avoid wasting server computing resources in sending invalid push Notification messages.

RSS: The old, but still useful, RSS. You’d be surprised how many online media don’t offer RSS feeds these days.

In order to avoid the same article being accidentally sent to the above channels for many times in a short time, I will generate a UUID for each article and store it in Redis. Every time you modify a brief comment, you will check whether the UUID exists. If not, you will push it to the above channels. Otherwise, it will not be pushed.

Many readers have noticed that the Bay Area Daily has been rehashing articles from months ago on various social media sites. I have detailed reasons for this in my FAQ. To put it simply, in six months, when the Bay Area Daily’s readership doubles, half of its readers will have not read any of the articles I have previously recommended, and the “old” articles will be “new” to them.

How to fry cold rice? When recommending a new article every day, I will judge whether the article is evergreen Content, and then check whether it is still applicable half a year later. If so, I throw the article into a queue (in Postgres); Then Celery Beat every hour with probability (different probabilities for different periods) pick up articles from queue and post them on weibo, Twitter, Facebook etc. Like this one. These old messages in the queue will be cleaned up regularly; There are articles that have actually been posted several times, and longtime readers are getting tired of them or outgrow them (see the upcoming Apple Watch article; No more posts like this now that the Apple Watch has been released), it has to be removed from the queue.

If you frequently use the Bay Area Daily’s iOS app or website, you will find that I have disclosed the operation figures of bay Area Daily: visits, app downloads, the number of users who open the app’s news push, the number of paid users of the app, etc. The chart below shows you where to find various operating data for the Bay Area Daily:

Bay Area Daily is not a company, and I don’t work with anyone, so it’s my call. I think there are very few practical cases of operating websites on the Chinese Internet. A person interested in the Internet, of course, would like to see a real business case and real data as a learning material — kind of like a textbook case study.

The data was extracted from App Annie’s API, Google Analytics’ API, and my own background database. Celery Beat start 1 job per hour update these data.

For bay Area Daily’s iOS App, the number I care most about is how many users have push Notification enabled; Only when Push Notification is enabled, users can be informed of the daily updates of the Bay Area in time. My Slack gets notified every time a new user opens Push Notification or pays. Here’s a notification from Slack on my iWatch telling me that a new user has downloaded the App and opened Push Notification:

For more on Slack and Hubot, see this blog post: Bay Area Daily’s First “employee” : Slack/Hubot

Every article shared by the Bay Area Daily has a statistics of the number of visitors, which is stored in Redis, which is much less than the cost of writing a database for every visit on Postgres. It doesn’t matter if you lose an hour of data on Redis, this statistic is not that critical.

The search function allows you to search past articles by keyword on both the website and iOS app. I wanted to do it with ElasticSearch, but I didn’t want to manage one more ElasticSearch, so I added a failure point, which would have taken a little more time and wasn’t worth it.

Later, inspired by this article, I directly used Postgres to do the full-text index in one night, and realized the search function of the website and APP, which was really a quick and rough approach.

I usually release a new version of iOS App once a week. Each update is actually very limited. I usually spend less than an hour a week writing app code.

The site’s backend code is updated mostly through Slack. Still talking to the little robot: “wq deploy”. It will call the new Master Branch of the Git repo on the Checkout site and restart the process. I use symlink for version control, so if I find a serious bug and want to rollback to the previous version, I just switch symlink in a matter of seconds.

Releasing code became fun and confident, which encouraged the developer (me) to try out novel ideas, iterate quickly, and deploy small updates quickly. So the Bay Area Daily has been evolving. The Bay Area Daily Isn’t much better today than it was yesterday, but this quarter is certainly better than last.

To see if the site is down, I use Pingdom; If the site goes down, I get a text alert. I use Datadog (which I learned from a long chat at Dockercon’s booth two years ago) — in theory, monitoring and Alerting can be built on their own with open source software, but Bay Area Daily is just a side project. There is an existing SaaS solution that can be implemented, so you don’t have to spend time doing it yourself. And oneself do affirmation is inferior to somebody else professional.

For detailed monitoring of websites and app background, I use Datadog. You can easily see the number of requests and latency for different endpoints over a specific period of time. See: How does the Bay Area Daily Monitor the health of its system

For DNS management of website domain names, I use CloudFlare. There are other benefits to using CloudFlare, such as it helps me block malicious requests and cache static files.

Customer service gets letters every day. Of course, most questions can be answered in the FAQ. But I’ve also had a lot of letters of encouragement, and it’s really nice to know that you’ve learned and grown from the articles that the Bay Area Daily recommended.

Most of the readers’ letters on Weibo and wechat can be automatically replied with matching keywords, which can relieve the pressure of manual reply. Taking the time to write a FAQ page saves time responding to most emails.

Bay Area Daily has three Git repOs (website backend, iOS app, and operation related scripts) on Bitbucket, which are not publicly available. The reason to use Bitbucket instead of GitHub is to save money; If you want to use a private repo on GitHub, you have to pay; Bitbucket, for my personal use, can have an unlimited number of private repOs.

I prefer to commit small pieces of code, and when something goes wrong you can revert them in a smaller granularity. A COMMIT typically contains no more than 100 lines of code. The obvious code change is that I’m working directly on the Master Branch and pushing the master directly; I would set up a new branch and send myself a pull Request to review my code.

I often post unfinished code and wrap it in feature switch. Most of my schemes in this kind of small stupid project are local methods, which don’t need to be too academic and don’t need to be too pushy. Therefore, a feature switch is actually a row in a table in Postgres. The Bay Area Daily’s 99.999% workload is read-only, so many of the pages on the site are actually cached in memory and CDN, so the database access won’t be too crazy.

By the way, I’ve had a bit of a laugh or a cry when some young readers (students or young professionals) have emailed me to force me to open source the Bay Area Daily. Think of a natural disaster in which celebrities are forced to donate money. See Article 18 of the FAQ for details.

I write code on my home iMac. Run the virtual machine Vagrant + VirtualBox under Mac OS X. The virtual machine runs Ubuntu, the same operating system used in Production. The folders of code are then shared between Mac OS X and the virtual machine via Vagrant’s Synced Folders. Write code on Mac OS X using PyCharm and run the server in a virtual machine using Vagrant’s support from PyCharm.

Data backup there is an automatic daily job that dumps all the data, compresses it, names it by date, and uploads it somewhere. Where do I upload it? Any cloud storage service that offers an API (Dropbox, Box, etc.) will do.

For PostgreSQL data, I use Django’s dumpdata command to dump all the data into a JSON file and then gzip it.

For data in Redis, gzip the dump. RDB file on disk.

Read more on this blog post: How does the Bay Area Daily Back up its database?

In conclusion, if I read this article on August 6, 2014, I would definitely not do the Side project of Bay Area Daily. It takes too much time — there is a website to build, a publishing system to build, a Swift to learn and an iOS app to write, and a new version of the app to be released every week. I have to manage several distribution channels, answer customer service emails, and spend one to three hours a day reading articles and writing brief reviews.

Fortunately, by the time this article is written, most of the hard work has already been done. This is the product iteration in the Internet era. It is a gradual process in which individuals and products grow together. You can’t expect overnight success — you can, but it happens on the 1,000th or even 3,600th day.

There is little financial return to running the Bay Area Daily, and while there is some advertising and in-app revenue, it is negligible compared to the time invested (multiplied by the hourly salary of even an intern). The reason why I can continue to do this job is the accumulation of knowledge and personal growth. These operational experiences and what I learned from the 5 articles every day can be applied to my other projects. Even if nothing is achieved in the end, it will be a good memory. When I die, I can at least look back on my life and say that I run a blog called The Bay Area Daily, which has tens of thousands of readers every day (I’m one of them), and we’re getting better every day. That’s enough.

Related Posts

The wechat mini program you asked for has finally arrived

2020 front-end for experience sharing (Denver debut), small white to the smooth transformation from interview essay | the nuggets technology

MySQL primary/secondary mode