2024-07-17 11:05:33 -07:00
|
|
|
|
# cnblogs archiver
|
|
|
|
|
|
|
|
|
|
## How can I help?
|
|
|
|
|
|
|
|
|
|
Go to [release](https://git.saveweb.org/saveweb/cnblogs/releases) page, downlaod `cnblogs_posts_list` and run it.
|
|
|
|
|
|
|
|
|
|
WARNING: DO NOT run `cnblogs_posts_list` concurrently (on same IP), you may be banned by cnblogs.
|
|
|
|
|
|
|
|
|
|
NOTE: We will publish a docker image soon™ (<30 minutes).
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
NOTE: `cnblogs_rss_detect` is finished, you don't need to run it.
|
|
|
|
|
|
|
|
|
|
## 存档阶段
|
|
|
|
|
|
|
|
|
|
### 阶段1: 探测所有存在的 blogid (已完成)
|
|
|
|
|
|
|
|
|
|
运行 `cnblogs_rss_detect`
|
|
|
|
|
|
|
|
|
|
### 阶段2:遍历全部 blog,收集所有文章的 URL(正在进行)
|
|
|
|
|
|
|
|
|
|
运行 `cnblogs_posts_list`
|
|
|
|
|
|
|
|
|
|
### 阶段3:导出文章 urls.txt 并发送给 ArchiveTeam
|
|
|
|
|
|
|
|
|
|
### 阶段4:下载文章 html
|
|
|
|
|
|
|
|
|
|
保留一份全站文章的纯文本存档(STWP)
|