Go to file

yzqzss 0a984a1e3c Update stage3 status		2024-08-24 01:56:55 -07:00
.gitea/workflows	init	2024-07-17 14:56:23 +08:00
cmd/cnblogs_posts_list	feat: handle infinite loops gracefully	2024-07-18 16:59:30 +08:00
cmd_disabled/cnblogs_rss_detect	feat cnblogs_posts_list	2024-07-18 01:36:32 +08:00
pkg	rm poweredby flag	2024-07-17 14:04:19 -07:00
.gitignore	feat cnblogs_posts_list	2024-07-18 01:36:32 +08:00
go.mod	feat cnblogs_posts_list	2024-07-18 01:36:32 +08:00
go.sum	feat cnblogs_posts_list	2024-07-18 01:36:32 +08:00
README.md	Update stage3 status	2024-08-24 01:56:55 -07:00

README.md

cnblogs archiver

How can I help?

Binary

Go to release page, downlaod cnblogs_posts_list and run it.

WARNING: DO NOT run cnblogs_posts_list concurrently (on same IP), you may be banned by cnblogs.

With Docker

export ARCHIVIST=<your_node_name> # a string that can uniquely identify your node (for example: bob-gcloud-514). (Legal characters: letters, numbers, -, _)

if [[ -z "$ARCHIVIST" ]]; then
    echo "WARN: ARCHIVIST must be set"
    exit 1
fi
_image="icecodexi/saveweb:cnblogs"
docker pull "${_image}" \
    && docker stop cnblogs
docker rm -f cnblogs \
    && docker run --env ARCHIVIST="$ARCHIVIST" --restart always \
        --volume /etc/localtime:/etc/localtime:ro \
        --cpu-shares 512 --memory 512M --memory-swap 512M \
        --label=com.centurylinklabs.watchtower.enable=true \
        --detach  --name cnblogs \
        "${_image}"

Archiving stages

stage1：detect all blogids (finished)

run cnblogs_rss_detect

stage2：iterate all blogids and collect all posts' URLs (finished)

run cnblogs_posts_list

README.md

cnblogs archiver

How can I help?

Binary

With Docker

Archiving stages

stage1：detect all blogids (finished)

stage2：iterate all blogids and collect all posts' URLs (finished)

stage3：export all posts' URLs and send to ArchiveTeam (finished)

stage4：also download all posts' HTMLs by ourselves (TODO)

README.md Unescape Escape

cnblogs archiver

How can I help?

Binary

With Docker

Archiving stages

stage1：detect all blogids (finished)

stage2：iterate all blogids and collect all posts' URLs (finished)

stage3：export all posts' URLs and send to ArchiveTeam (finished)

stage4：also download all posts' HTMLs by ourselves (TODO)

README.md