Help Wanted: Hong Kong’s pro-democracy newspaper in imminent danger

Update: Thank you so much for your help and support! This blew up beyond my imagination.

Unfortunately, time is running short. [The website will go down at 23:59 HKT or 15:59UTC.](https://hk.appledaily.com/local/20210623/WSI6PSB2EFCO5JAUMLLZOP4RGM/)

Urgent help is needed on outlinks on [this page](https://hk.appledaily.com/member/). Some of them are on different domains and subdomains.

Apple Daily has other YouTube channels that need saving:

Lifestyle section [https://www.youtube.com/channel/UCCzKM7UMxGCPAgUDXmFw5Gg](https://www.youtube.com/channel/UCCzKM7UMxGCPAgUDXmFw5Gg)

Food section [https://www.youtube.com/user/eatravel](https://www.youtube.com/user/eatravel)

Next Magazine [https://www.youtube.com/channel/UC-8CVMKt5Zlju_i07zhkC-Q](https://www.youtube.com/channel/UC-8CVMKt5Zlju_i07zhkC-Q)

(Courtesy of /u/hkrwa)

**Hop on this Matrix room to talk about progress and targets: #archivinghk:matrix.org**

Other at-risk media outlets include:

[Apple Daily Facebook Page](https://www.facebook.com/hk.nextmedia/videos)

[The Stand News](https://thestandnews.com/) | [YouTube Channel](https://www.youtube.com/channel/UCGe96mv2FcdfQXmtSXYF_fA) | [Facebook Videos](https://www.facebook.com/standnewshk/videos)

[CitizenNews](https://www.hkcnews.com/) | [YouTube Channel](https://www.youtube.com/channel/UC7K4DBOzdITZFOjGkea_CCA)

[inmediahk](https://www.inmediahk.net/) | [YouTube Channel](https://www.youtube.com/user/inmediahk)

[Hong Kong Free Press](https://hongkongfp.com/) | [YouTube Channel](https://www.youtube.com/c/hongkongfp)

DM me on Matrix or Reddit to add to this list.


Original Post:

Hello friends,


Hong Kong’s most-read pro-democracy newspaper, Apple Daily, [has had its office raided, its executives arrested, its assets frozen](https://www.theguardian.com/world/2021/jun/17/hong-kong-police-arrest-editor-in-chief-of-apple-daily-newspaper-in-morning-raids) in the last few days. This is the death knell of press freedom in Hong Kong, and Apple Daily’s trove of reporting on the 2019 Anti-Extradition Protests and Tiananmen massacre vigils is under [imminent danger of disappearing in a few days](https://www.theguardian.com/world/2021/jun/21/hong-kong-apple-daily-newspaper-crisis-talks-avert-shutdown-advisor-says). I’m a noob at archiving and a project of this size and deadline is well beyond my means and knowledge. I need help backing this up to Wayback Machine.


Target #1: [Apple Daily’s Website](https://hk.appledaily.com/)

Unfortunately, there is a paywall. The first article is free, the next ones are paywalled, enforced by cookies. In my testing, it seems like the paywall doesn’t affect tools like wget or curl. If all else fails, you could try using googlebot’s user agent:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/91.0.4472.114 Safari/537.36

(replace “Chrome/91.0.4472.114” with [latest chrome version](https://chromereleases.googleblog.com/))

If you still find yourself hitting the paywall, keep saving anyway. The paywall is Javascript-based, all the content is still there.

The archive is conveniently accessible at URLs like [https://hk.appledaily.com/archive/20210619/](https://hk.appledaily.com/archive/20210619/)

You can just iterate through the dates. **Please prioritize the period of February 2019 to now.**

I’m not exactly sure if that covers everything. It might still be a good idea to crawl the website.

Many articles contain videos, but youtube-dl doesn’t seem to work. I’m out of ideas on how to get them.


Target #2: [The YouTube Channel](https://www.youtube.com/user/appleactionews)

Start with this playlist: [https://www.youtube.com/playlist?list=PLQcmGU2t4gsp4RMgrqnMiaJq24FJyWGhz](https://www.youtube.com/playlist?list=PLQcmGU2t4gsp4RMgrqnMiaJq24FJyWGhz)

Please note that the channel contains many hours-long livestreams. Those are important, as they are live recordings on the frontline of the protests. I have no idea how much space that’s going to take, sorry.

Again, **please prioritize the period of February 2019 to now.**

**Save the website first, as that’s most likely to go down sooner.**


Thank you so much for your help. You’ll be doing all Hongkongers a big favour.


P.S.: I’ve been fighting ArchiveBox for the past week to no avail. If you know a more noob-friendly alternative, please comment thanks.

  1. https://appledaily-hk-appledaily-prod.cdn.arcpublishing.com/ is still up, I will try to download it with HTTrack

    Edit: it works and I am downloading now, ok it will get the paywall warning, but I will still get them and hope someone will be able to disable the paywall one day.

    Edit: I got 503

  2. I have the pdf of the last newspaper, where can I post it? I’m new to data hoarding

    Edit: Wrong account but reach out to me on this account still

  3. Would like to post an update: Apple Daily’s Youtube Page is down. ([Source](https://twitter.com/alvinllum/status/1407735355445420035))

    However, the Facebook Page is still up and running as of 0027 HK Time

    (Also a suggestion: if the videos from Apple Daily cannot be archived any more, one should proceed to archiving other at risk medias as listed in OP. I would recommend starting with livestreams on StandNews)

    As an HKer, thank you archivists for doing everything you can to preserve our history.

  4. Another non-archiver here, want to thank all for your hard work!

  5. Would like to add to the list of at-risk media outlets: StandNews has a lot of livestreams documenting the 2019 protests in Hong Kong and the aftermath. It’s on their facebook page. Please help back up:


    [Standnews Facebook Page](https://www.facebook.com/standnewshk/videos)

  6. Official announcement: [We are sad to inform you that Apple Daily’s web and app content will no longer be accessible no later than Wednesday 23 June 2021, 2359 HKT.](https://hk.appledaily.com/local/20210623/WSI6PSB2EFCO5JAUMLLZOP4RGM/), aka 5 hours from now.

  7. An update for you all….that Appledaily is going to fold its operations at hong kong time 0000 6/24. (GMT 1600). I am afraid that we have no much time left.

    This project will be greatly appreciated by all Hongkongese.



  8. Apple Daily and its sister publications have more than 1 Youtube channels. Is anyone working on them?

    Lifestyle section [https://www.youtube.com/channel/UCCzKM7UMxGCPAgUDXmFw5Gg](https://www.youtube.com/channel/UCCzKM7UMxGCPAgUDXmFw5Gg)

    Food section [https://www.youtube.com/user/eatravel](https://www.youtube.com/user/eatravel)

    Next Magazine [https://www.youtube.com/channel/UC-8CVMKt5Zlju_i07zhkC-Q](https://www.youtube.com/channel/UC-8CVMKt5Zlju_i07zhkC-Q)

  9. Hello my friends all over the world. Thank you so much for helping the hongkongers.

    This is an important message:

    UPDATE: Digital version of Apply Daily will be offline in less than 3 hours, ending its service after 1159 23/6 HKT.

    Digital version of Apply Daily will NOT be available after 1159 26 June Hong Kong Time.

    [Report from the Guardian](https://www.theguardian.com/world/2021/jun/23/hong-kong-apple-daily-symbol-of-pro-democracy-movement-to-close?CMP=Share_AndroidApp_Other)

  10. Need help into downloading the video files on their webpage. They have reduced uploads to YouTube since they implemented a soft paywall.

  11. The people and the work being done here is honestly inspiring me to learn something that can help do my part

  12. I’ve nearly 25TB available. Any way I can help? The only tool i know is youtube-dl, however it throttled at 80KB/s or less.

  13. Not an archiver myself, but best of luck guys. The Internet needs more people like you.

  14. A huge thank you from another HongKonger to you guys for helping us.

  15. I don’t know how others are archiving from YouTube but here’s the youtube-dl parameters I’m using, and YouTube is throttling me down to ~70kB/s on some downloads even with a current cookies file. Logging into youtube-dl seems like a flaky pain in the ass.

    Here’s the command I’m using:

    `youtube-dl –cookies ~/cookies.txt –download-archive ./archive.txt –write-description –write-info-json –write-annotations –write-sub –write-auto-sub –write-thumbnail -i -f ‘bestvideo[ext=mp4]+bestaudio[ext=m4a]/bestvideo+bestaudio’ -o ‘%(upload_date)s-%(title)s.%(ext)s’ https://www.youtube.com/playlist?list=UUeqUUXaM75wrK5Aalo6UorQ`

    Anyone know of a trick to get around that while still maintaining bestaudio+bestvideo? I’ve got a 500mbit connection and I could rip this down in a hurry otherwise.

  16. Someone call Jason Scott and Archive Team.

  17. I’m not seeing a paywall on the english version, maybe it’s been disabled? Using firefox with agressive content and ad blocking, might be why.

  18. >ArchiveBox

    it only runs on linux, if that helps. Use a virtual machine or windows Subsystem for linux 2

  19. I am not one of the people of Hong Kong, but a guy from the UK and I just wanted to say how inspired I’ve been by everyone. There seems to be a real fighting spirit still even though it looks as if the ship is sinking. This truly breaks my heart as I cannot stand the Chinese rule never mind living under it. I wish you all the best and I will see if I can download any of this if it helps. Good luck

  20. Shit, i’m literally in a forest for the rest of the week. If it’s still there by saturday, i’ll back it up and prepare a deep web archive

  21. Sorry if this is not the most relevant but are there currently any projects or plans to archive Apple Daily’s twitter feed and also RTHK’s? ([https://twitter.com/appledaily_hk](https://twitter.com/appledaily_hk) and [https://twitter.com/rthk_enews](https://twitter.com/rthk_enews) )

    Since both accounts have a huge back catalogue (over 200k and 100k tweets respectively) and ideally it’ll be an ongoing archive I haven’t been able to find any easy/existing solutions that’ll work, it’s honestly way beyond my technical levels. Help much appreciated.

  22. I have unlimited google drive storage, I will try to backup their YouTube channel for you guys

  23. Im pretty sure I have a subscription, through my phone. Since Im stupid tell me how to fish the cookie out and I’ll wget or curl or whatever the kids do these days.

  24. Update:
    I was able to save all videos from 2012 and 2013, but only some from 2014 before it went dark. Sorry. I will upload what I got to [Archive.org](https://Archive.org) over the next days/weeks.

    2012: [https://archive.org/details/hk-apple-daily-2012](https://archive.org/details/hk-apple-daily-2012)

    2013: [https://archive.org/details/hk-apple-daily-2013](https://archive.org/details/hk-apple-daily-2013)

    2014: [https://archive.org/details/hk-apple-daily-2014](https://archive.org/details/hk-apple-daily-2014) (partial)

  25. The live streams are actually important because it could contain raw footages of police brutalities happened since 2019 Hong Kong protests. Perpetrators maybe seeking foreign residency and this could be the only evidence to stop them.

  26. I have access to paywall, subscribed few days ago, please contact me on telegram!!

  27. I [brought this up](https://old.reddit.com/r/Archiveteam/comments/o2diiu/hks_apple_daily_raided_might_be_a_good_time_to/) on /r/Archiveteam a few days ago, and it looks like [ArchiveBot](https://wiki.archiveteam.org/index.php/ArchiveBot) is already scraping hk.appledaily.com (but it has a long way to go).

    /u/JustAnotherArchivist said he’d get another job running for en.appledaily.com later. I just checked and there’s a new job running for tw.appledaily.com.

    However, looking at the URLs in the status, it’s still working on 2021 articles, so there might be some value in targeting 2019/2020 articles if the shutdown in immanent.

  28. I’m currently working on a Python script to download the archive of the apple daily website specifically the html and the videos from 20190101 onwards.

    For the videos, ~~I just used a simple regex for the cdn mp4 url of the video.~~ I am downloading using this method: each article has a json object contained within `Fusion.globalContent` and `;`, can use regex to get this. Within this json object there are the video urls contained, I am downloading the mp4 directly and using youtube-dl to download the m3u8/ts streams.

    update 2021-06-21 1836BST: Using 10 threads, 26k articles and 3,282 (52GB) videos downloaded so far.

    2026BST: 58k articles, 7,295 videos (123GB).

    2253BST: 105k articles, 15k videos (247GB). all of 2019 videos downloaded

    0039BST: 112k articles, 17k videos (309GB). downloading 2019 images.

    1225BST: 139k articles, 23k videos (508GB)

    1627BST: 142k articles, 24k videos (548GB). all 2019-2020 videos downloaded.

    2019-2020 articles archive json: https://mega.nz/file/A0kUXJhb#A3mZW947XDRRcuy1zRp76l9lEzL3QMmx_-yXSQHSubY

    2124BST: 2019-01-01 to 2021-06-19 all 154k articles and 27k videos downloaded (900GB). starting to download all pictures 2019- and years 2014-2019.

    0000BST: 2019-01-01 to 2021-06-19 all articles, 109k images and videos downloaded.

    2021-06-23 1928BST: 1.8TB of data from the website mostly downloaded, from 2014 onwards: articles, images, video.

    2328BST: 2014-01-01 to 2021-06-22 ~2TB website data downloaded, articles, images, videos.

    2021-06-26 – update: the archive team have a copy of all the urls/metadata that i have, and have been redownloading the video/images directly from the cdn server, surprisingly its still returning data. they are almost done downloading everything.

  29. Archive.org has a software called Archivebot, used to copy at-risk or critical ites to the internet archives.

  30. Man, autocratic incarnations everywhere in the 21st century. This might soon happen to press freedom in India.

  31. Could store on IPFS once archived. Y’all using a python script for scraping?

  32. I can back up 9TB worth of youtube videos so ill start there.

    EDIT: Half way through the playlist now, will start downloading the rest of their videos too when done. Will pick up some more hard drives as needed.

  33. Didn’t the ccp already remove most of the data?

  34. What do you mean? CCP took over HK nearly a year ago(2020) after the year and half long protest(2019). How did the newspaper operate as pro democracy for that long under CCP control?