About
Creeper is a next-generation crawler which fetches web page by creeper script. As a cross-platform embedded crawler, you can use it for your news app, subscribe program, etc.
Warning: At present this project is still under stage-1 development, please do not use in the production environment.
Installation
$ go get github.com/wspl/creeper
Copy the code
Script Spec
Node
Nodes are tree structure that represent the data structure you are going to crawl.
news[]: page -> $("tr.athing")
title: $(".title a.storylink").text
site: $(".title span.sitestr").text
link: $(".title a.storylink").href
Copy the code
Like yaml
, nodes distinguishes the hierarchy by indentation.
Page
Page indicates where to fetching the field data. It can be a town expression or field reference.
Field reference is a advanced usage of Node, you can found the details in ./eh.crs.
If a node owned page and fun at the same time, page should on the left of ->
, fun should on the right of ->
. Which is page -> fun
Fun
Fun represents the data processing process.
There are all supported funs:
Name | Parameters | Description |
---|---|---|
$ | (selector: string) | CSS selector |
html | inner HTML | |
text | inner text | |
outerHTML | outer HTML | |
attr | (attr: string) | attribute value |
style | style attribute value | |
href | href attribute value | |
src | src attribute value | |
calc | (prec: int) | calculate arithmetic expression |
match | (regexp: string) | match first sub-string via regular expression |
expand | (regexp: string, target: string) | expand matched strings to target string |