2024-07-17 11:05:33 -07:00
|
|
|
|
# cnblogs archiver
|
|
|
|
|
|
|
|
|
|
## How can I help?
|
|
|
|
|
|
|
|
|
|
Go to [release](https://git.saveweb.org/saveweb/cnblogs/releases) page, downlaod `cnblogs_posts_list` and run it.
|
|
|
|
|
|
|
|
|
|
WARNING: DO NOT run `cnblogs_posts_list` concurrently (on same IP), you may be banned by cnblogs.
|
|
|
|
|
|
|
|
|
|
NOTE: We will publish a docker image soon™ (<30 minutes).
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
NOTE: `cnblogs_rss_detect` is finished, you don't need to run it.
|
|
|
|
|
|
2024-07-17 11:11:10 -07:00
|
|
|
|
## Archiving stages
|
2024-07-17 11:05:33 -07:00
|
|
|
|
|
2024-07-17 11:11:10 -07:00
|
|
|
|
### stage1:detect all blogids (~~finished~~)
|
2024-07-17 11:05:33 -07:00
|
|
|
|
|
2024-07-17 11:11:10 -07:00
|
|
|
|
run `cnblogs_rss_detect`
|
2024-07-17 11:05:33 -07:00
|
|
|
|
|
2024-07-17 11:11:10 -07:00
|
|
|
|
### stage2:iterate all blogids and collect all posts' URLs (running)
|
2024-07-17 11:05:33 -07:00
|
|
|
|
|
2024-07-17 11:11:10 -07:00
|
|
|
|
run `cnblogs_posts_list`
|
2024-07-17 11:05:33 -07:00
|
|
|
|
|
2024-07-17 11:11:10 -07:00
|
|
|
|
<!-- ### stage3:导出文章 urls.txt 并发送给 ArchiveTeam -->
|
2024-07-17 11:05:33 -07:00
|
|
|
|
|
2024-07-17 11:11:10 -07:00
|
|
|
|
### stage3:export all posts' URLs and send to ArchiveTeam (TODO)
|
2024-07-17 11:05:33 -07:00
|
|
|
|
|
2024-07-17 11:11:10 -07:00
|
|
|
|
### stage4:also download all posts' HTMLs by ourselves (TODO)
|