cnblogs/README.md

51 lines
1.4 KiB
Markdown
Raw Normal View History

2024-07-17 11:05:33 -07:00
# cnblogs archiver
2024-07-17 13:02:17 -07:00
## How can I help?
### Binary
2024-07-17 11:05:33 -07:00
Go to [release](https://git.saveweb.org/saveweb/cnblogs/releases) page, downlaod `cnblogs_posts_list` and run it.
WARNING: DO NOT run `cnblogs_posts_list` concurrently (on same IP), you may be banned by cnblogs.
2024-07-17 13:02:17 -07:00
### With Docker
2024-07-17 12:55:34 -07:00
```bash
export ARCHIVIST=<your_node_name> # a string that can uniquely identify your node (for example: bob-gcloud-514). (Legal characters: letters, numbers, -, _)
```
```bash
if [[ -z "$ARCHIVIST" ]]; then
echo "WARN: ARCHIVIST must be set"
exit 1
fi
_image="icecodexi/saveweb:cnblogs"
docker pull "${_image}" \
&& docker stop cnblogs
docker rm -f cnblogs \
&& docker run --env ARCHIVIST="$ARCHIVIST" --restart always \
--volume /etc/localtime:/etc/localtime:ro \
--cpu-shares 512 --memory 512M --memory-swap 512M \
--label=com.centurylinklabs.watchtower.enable=true \
2024-07-17 12:55:34 -07:00
--detach --name cnblogs \
"${_image}"
```
2024-07-17 11:05:33 -07:00
2024-07-17 11:11:10 -07:00
## Archiving stages
2024-07-17 11:05:33 -07:00
2024-07-17 11:11:10 -07:00
### stage1detect all blogids (~~finished~~)
2024-07-17 11:05:33 -07:00
2024-07-17 11:11:10 -07:00
run `cnblogs_rss_detect`
2024-07-17 11:05:33 -07:00
2024-07-17 13:02:17 -07:00
NOTE: `cnblogs_rss_detect` is finished, you don't need to run it.
2024-07-17 11:11:10 -07:00
### stage2iterate all blogids and collect all posts' URLs (running)
2024-07-17 11:05:33 -07:00
2024-07-17 11:11:10 -07:00
run `cnblogs_posts_list`
2024-07-17 11:05:33 -07:00
2024-07-17 11:11:10 -07:00
<!-- ### stage3导出文章 urls.txt 并发送给 ArchiveTeam -->
2024-07-17 11:05:33 -07:00
2024-07-17 11:11:10 -07:00
### stage3export all posts' URLs and send to ArchiveTeam (TODO)
2024-07-17 11:05:33 -07:00
2024-07-17 11:11:10 -07:00
### stage4also download all posts' HTMLs by ourselves (TODO)