# cnblogs archiver ## How can I help? Go to [release](https://git.saveweb.org/saveweb/cnblogs/releases) page, downlaod `cnblogs_posts_list` and run it. WARNING: DO NOT run `cnblogs_posts_list` concurrently (on same IP), you may be banned by cnblogs. ### Docker ```bash export ARCHIVIST= # a string that can uniquely identify your node (for example: bob-gcloud-514). (Legal characters: letters, numbers, -, _) ``` ```bash if [[ -z "$ARCHIVIST" ]]; then echo "WARN: ARCHIVIST must be set" exit 1 fi _image="icecodexi/saveweb:cnblogs" docker pull "${_image}" \ && docker stop cnblogs docker rm -f cnblogs \ && docker run --env ARCHIVIST="$ARCHIVIST" --restart always \ --volume /etc/localtime:/etc/localtime:ro \ --cpu-shares 512 --memory 512M --memory-swap 512M \ --detach --name cnblogs \ "${_image}" ``` --- NOTE: `cnblogs_rss_detect` is finished, you don't need to run it. ## Archiving stages ### stage1:detect all blogids (~~finished~~) run `cnblogs_rss_detect` ### stage2:iterate all blogids and collect all posts' URLs (running) run `cnblogs_posts_list` ### stage3:export all posts' URLs and send to ArchiveTeam (TODO) ### stage4:also download all posts' HTMLs by ourselves (TODO)