cnblogs/README.md

# cnblogs archiver

## How can I help? 

### Binary

Go to [release](https://git.saveweb.org/saveweb/cnblogs/releases) page, downlaod `cnblogs_posts_list` and run it.

WARNING: DO NOT run `cnblogs_posts_list` concurrently (on same IP), you may be banned by cnblogs.

### With Docker

```bash
export ARCHIVIST=<your_node_name> # a string that can uniquely identify your node (for example: bob-gcloud-514). (Legal characters: letters, numbers, -, _)
```

```bash
if [[ -z "$ARCHIVIST" ]]; then
    echo "WARN: ARCHIVIST must be set"
    exit 1
fi
_image="icecodexi/saveweb:cnblogs"
docker pull "${_image}" \
    && docker stop cnblogs
docker rm -f cnblogs \
    && docker run --env ARCHIVIST="$ARCHIVIST" --restart always \
        --volume /etc/localtime:/etc/localtime:ro \
        --cpu-shares 512 --memory 512M --memory-swap 512M \
        --label=com.centurylinklabs.watchtower.enable=true \
        --detach  --name cnblogs \
        "${_image}"
```


## Archiving stages

### stage1：detect all blogids (~~finished~~)

run `cnblogs_rss_detect`

NOTE: `cnblogs_rss_detect` is finished, you don't need to run it. 

### stage2：iterate all blogids and collect all posts' URLs (running)

run `cnblogs_posts_list`

<!-- ### stage3：导出文章 urls.txt 并发送给 ArchiveTeam -->

### stage3：export all posts' URLs and send to ArchiveTeam (TODO)

### stage4：also download all posts' HTMLs by ourselves (TODO)
-												add README

											
										
										
											2024-07-17 11:05:33 -07:00
+								# cnblogs archiver
-												format README

											
										
										
											2024-07-17 13:02:17 -07:00
+								## How can I help?
 								### Binary
-												add README

											
										
										
											2024-07-17 11:05:33 -07:00
 								Go to [release](https://git.saveweb.org/saveweb/cnblogs/releases) page, downlaod `cnblogs_posts_list` and run it.
 								WARNING: DO NOT run `cnblogs_posts_list` concurrently (on same IP), you may be banned by cnblogs.
-												format README

											
										
										
											2024-07-17 13:02:17 -07:00
+								### With Docker
-												publish docker

											
										
										
											2024-07-17 12:55:34 -07:00
 								```bash
 								export ARCHIVIST=<your_node_name> # a string that can uniquely identify your node (for example: bob-gcloud-514). (Legal characters: letters, numbers, -, _)
 								```
 								```bash
 								if [[ -z "$ARCHIVIST" ]]; then
 								    echo "WARN: ARCHIVIST must be set"
 								    exit 1
 								fi
 								_image="icecodexi/saveweb:cnblogs"
 								docker pull "${_image}" \
 								    && docker stop cnblogs
 								docker rm -f cnblogs \
 								    && docker run --env ARCHIVIST="$ARCHIVIST" --restart always \
 								        --volume /etc/localtime:/etc/localtime:ro \
 								        --cpu-shares 512 --memory 512M --memory-swap 512M \
-												--label=com.centurylinklabs.watchtower.enable=true

											
										
										
											2024-07-17 14:52:48 -07:00
+								        --label=com.centurylinklabs.watchtower.enable=true \
-												publish docker

											
										
										
											2024-07-17 12:55:34 -07:00
+								        --detach  --name cnblogs \
 								        "${_image}"
 								```
-												add README

											
										
										
											2024-07-17 11:05:33 -07:00
-												README i18n

											
										
										
											2024-07-17 11:11:10 -07:00
+								## Archiving stages
-												add README

											
										
										
											2024-07-17 11:05:33 -07:00
-												README i18n

											
										
										
											2024-07-17 11:11:10 -07:00
+								### stage1：detect all blogids (~~finished~~)
-												add README

											
										
										
											2024-07-17 11:05:33 -07:00
-												README i18n

											
										
										
											2024-07-17 11:11:10 -07:00
+								run `cnblogs_rss_detect`
-												add README

											
										
										
											2024-07-17 11:05:33 -07:00
-												format README

											
										
										
											2024-07-17 13:02:17 -07:00
+								NOTE: `cnblogs_rss_detect` is finished, you don't need to run it.
-												README i18n

											
										
										
											2024-07-17 11:11:10 -07:00
+								### stage2：iterate all blogids and collect all posts' URLs (running)
-												add README

											
										
										
											2024-07-17 11:05:33 -07:00
-												README i18n

											
										
										
											2024-07-17 11:11:10 -07:00
+								run `cnblogs_posts_list`
-												add README

											
										
										
											2024-07-17 11:05:33 -07:00
-												README i18n

											
										
										
											2024-07-17 11:11:10 -07:00
+								<!-- ### stage3：导出文章 urls.txt 并发送给 ArchiveTeam -->
-												add README

											
										
										
											2024-07-17 11:05:33 -07:00
-												README i18n

											
										
										
											2024-07-17 11:11:10 -07:00
+								### stage3：export all posts' URLs and send to ArchiveTeam (TODO)
-												add README

											
										
										
											2024-07-17 11:05:33 -07:00
-												README i18n

											
										
										
											2024-07-17 11:11:10 -07:00
+								### stage4：also download all posts' HTMLs by ourselves (TODO)