Go to file
2024-08-24 01:56:55 -07:00
.gitea/workflows init 2024-07-17 14:56:23 +08:00
cmd/cnblogs_posts_list feat: handle infinite loops gracefully 2024-07-18 16:59:30 +08:00
cmd_disabled/cnblogs_rss_detect feat cnblogs_posts_list 2024-07-18 01:36:32 +08:00
pkg rm poweredby flag 2024-07-17 14:04:19 -07:00
.gitignore feat cnblogs_posts_list 2024-07-18 01:36:32 +08:00
go.mod feat cnblogs_posts_list 2024-07-18 01:36:32 +08:00
go.sum feat cnblogs_posts_list 2024-07-18 01:36:32 +08:00
README.md Update stage3 status 2024-08-24 01:56:55 -07:00

cnblogs archiver

How can I help?

Binary

Go to release page, downlaod cnblogs_posts_list and run it.

WARNING: DO NOT run cnblogs_posts_list concurrently (on same IP), you may be banned by cnblogs.

With Docker

export ARCHIVIST=<your_node_name> # a string that can uniquely identify your node (for example: bob-gcloud-514). (Legal characters: letters, numbers, -, _)
if [[ -z "$ARCHIVIST" ]]; then
    echo "WARN: ARCHIVIST must be set"
    exit 1
fi
_image="icecodexi/saveweb:cnblogs"
docker pull "${_image}" \
    && docker stop cnblogs
docker rm -f cnblogs \
    && docker run --env ARCHIVIST="$ARCHIVIST" --restart always \
        --volume /etc/localtime:/etc/localtime:ro \
        --cpu-shares 512 --memory 512M --memory-swap 512M \
        --label=com.centurylinklabs.watchtower.enable=true \
        --detach  --name cnblogs \
        "${_image}"

Archiving stages

stage1detect all blogids (finished)

run cnblogs_rss_detect

stage2iterate all blogids and collect all posts' URLs (finished)

run cnblogs_posts_list

stage3export all posts' URLs and send to ArchiveTeam (finished)

stage4also download all posts' HTMLs by ourselves (TODO)