Kaydol

Flood göndermek, insanların floodlarını okumak ve diğer insanlarla bağlantı kurmak için sosyal Floodlar ve Flood Yanıtları Motorumuza kaydolun.

Oturum aç

Flood göndermek, insanların floodlarını okumak ve diğer insanlarla bağlantı kurmak için sosyal Floodlar ve Flood Yanıtları Motorumuza giriş yapın.

Şifremi hatırlamıyorum

Şifreni mi unuttun? Lütfen e-mail adresinizi giriniz. Bir bağlantı alacaksınız ve e-posta yoluyla yeni bir şifre oluşturacaksınız.

3 ve kadim dostu 1 olan sj'yi rakamla giriniz. ( 31 )

Üzgünüz, Flood yazma yetkiniz yok, Flood girmek için giriş yapmalısınız.

Lütfen bu Floodun neden bildirilmesi gerektiğini düşündüğünüzü kısaca açıklayın.

Lütfen bu cevabın neden bildirilmesi gerektiğini kısaca açıklayın.

Please briefly explain why you feel this user should be reported.

Archiving the Gaza conflict

Hi guys, I’m sure you’ve seen the news. I’m not gonna get political but I want to back up and archive the footage from both sides for perpetuity. All of it.

It’s my first build, but I’ve already got the system somewhat figured out. What I’m struggling with is filtering and finding the photos, videos and articles.

Many post the same videos over and over again, and I don’t want to overwhelm my drives with GB’s of redundant data.

Do you know how could I find and filter efficiently what I’m looking for? If not, that’s okay, I’ll do it by hand.

Thanks in advance! I just want to keep a register of this stuff for when the news trend passes.

Greetings.

Edit: Wow, man. Thanks for the support. I’m really pumped up to start this project. I’ll be sure to check all your comments and recommendations as soon as I have a couple of minutes. I’ll try to gather support from a local news agency to see if they can publicize the project to extend the reach of it and possible get a hold on footage that isn’t available piblicly online.

Ily.

Benzer Yazılar

Yorum eklemek için giriş yapmalısınız.

27 Yorumları

  1. I’d recommend [ArchiveBox](https://archivebox.io), it takes care of extracting videos and media files using youtube-dl, and it also saves to Archive.org for redundancy.

    For particularly difficult pages I recommend https://ArchiveWeb.page and https://webrecorder.io, they have the best archival and replay tech for JS-heavy / media-heavy pages.

  2. I’m using youtube-dl to archive videos from Twitter. No deduplication tho. Is there anyways to download all videos from a twitter thread or even a twitter username?

  3. Check out syrianarchive and how they do it. They have more hours of footage than hours in the actual war.

  4. Reuters has had a live stream of the Gaza strip going fpr the past few days. I would recommend giving that a look

  5. So here is a bash script that will archive articles taken from Google News, from a previous project of mine. You might have to modify it slightly if you want to stick it into a cron job. You might also want to use keywords other than “Gaza”

    ~~~
    #!/bin/bash

    function __longnow(){
    # Use: Takes a txt file with one link on each line and pushes all the links to the internet archive
    # References:
    # https://unix.stackexchange.com/questions/181254/how-to-use-grep-and-cut-in-script-to-obtain-website-urls-from-an-html-file
    # https://github.com/oduwsdl/archivenow
    # For the double underscore, see: https://stackoverflow.com/questions/13797087/bash-why-double-underline-for-private-functions-why-for-bash-complet/15181999
    input=$1
    counter=1
    while IFS= read -r line
    do
    wait
    if [ $(($counter % 15)) -eq 0 ]
    then
    printf “nArchive.org doesn’t accept more than 15 links per min; sleeping for 1min…n”
    sleep 1m
    fi
    echo “Url: $line”
    archivenow –ia $line >& 1 ## alternatively, archivenow –all $line >& 1 if you want to use all archive services rather than just the internet archive
    counter=$((counter+1))
    done < “$input”

    }

    echo ‘Gaza’ | sed ‘s/^.*: //’ | sed ‘s/ /%20/g’ | sed ‘s/^/https://news.google.com/rss/search?q=/’ | xargs wget –quiet > /dev/null 2>&1 & wait ## This gets news about Gaza from the Google News API/XML endpoint

    echo “Gaza” | sed ‘s/^/search?q=/’ | sed ‘s/^/”/;s/$/”/’ | xargs xmllint –format 2>/dev/null | grep “title|pubDate|link” | sed ‘s/.*>(.*)<.*/1/’ | sed ‘0~3 a\’ >> listofnews.txt ## This parses the xml and appends data about each article to a file called “list of news”

    echo “Gaza” | sed ‘s/^/search?q=/’ | sed ‘s/^/”/;s/$/”/’ | xargs xmllint –format 2>/dev/null | grep “link” | sed ‘s/.*>(.*)<.*/1/’ > tempforarchiver.txt ## This just gets the links and creates something to be fed to an archiver service.

    __longnow tempforarchiver.txt

    rm search?q=Gaza
    rm tempforarchiver.txt
    ## Add this to cron with something like
    ## $ crontab -e
    ## 30 22 * * * /the/location/of/this/file ### Without the “#”
    ## This might give you some grief if bash or the archivenow utility can’t be found from within the cron instance.
    ~~~

  6. I would like to help you make multiple copies of the archive if you’re open to share the data

  7. I only ever lurk here, but its cool to sort by controversial on a thread concerning the most infamously polarizing topic on Reddit, and only 2/58 comments (3.4%) have a negative score.

    There’s somewhat of a tacit solidarity amongst users here, a “we’re all trying to do the same thing and are thus on the same team” kind of aura, similar to doomsday prepper and open-source coding subreddits.

    Anybody who is a fan of logic and science can ascertain that the footage OP wants to archive will inevitably be invaluable for history, ethics, and future progress toward diplomacy. No matter what s[i](https://vimeo.com/199418954)de you’re on in regards to the Gaza conflict, we can all agree on the maxim that preservation of footage documenting its chronology is an imperative.

  8. I think the most important videos are probably the Kan 11 live streams(Israeli news channel), I don’t know if there is more wide coverage than that. There is probably some news channels in Gaza with good streams too.

  9. Critical task. Make sure you only backup from good sources. Many misinformation and fake videos or other wars (like Syria) are being shared and claimed as the current conflict. Just make sure you filter all the nonsense. I’ve been hoarding videos that civilians captured with their phones. PM me if interested.

  10. Most deduplication softwares only dedup images and files mostly based on similar filenames or filesize, and some supports deduping on image content, but not video content, which seems to be the primary type of content you plan on retrieving.

    To dedup video, I found after years of search one software that worked good enough to be useful: [Video Duplicate Finder by 0x90d](https://github.com/0x90d/videoduplicatefinder).

    It’s open source and very easy to use with a GUI or in command-line. It will build a database of screenshots at different timepoints in each video and compare them. It works extremely well, it can find duplicates of different size, video quality (bitrate, resolution) and even different durations. It’s the fastest video deduplicator and also the most reliable I have ever used, others are gadgets compared to this one, which is crazy considering the whole code including algo of this software is open source and free while others need to be paid for. Rarely, some videos are not properly matched so you do need to check manually if you want to retain a maximum of videos, but otherwise if you don’t mind losing a few ones you can just select all duplicates and remove them.

    I suggest a matching threshold (“Percent” at the top of the window) of 85% as I found this striked the best balance between matching dups and an acceptably low rate of false positives, but you’ll need to review false positives even more. A good intermediate is to do a step-wise process: first you set a high matching threshold like 99% to ensure you first trim out the big bulk of duplicates with very high certainty that you can just delete right away without checking, then you can lower the threshold to 95% to find less obviously matching videos that you need to check but there will be much less now that you trimmed the 99% similarity bulk, then after decrease the threshold to 85% or even 75% if you really want to save storage space.

    See this previous discussion on reddit: https://www.reddit.com/r/DataHoarder/comments/de0m4x/video_duplicate_finder_2_find_duplicate_videos/

    However, note that the redditors there were incorrect about how the similarity algo works, the app doesn’t just take one screenshots but several all along the video. It only shows one matched thumbnail in the GUI for quick comparison so you can get a quick idea whether both videos indeed match, but you can easily change the number of thumbnails in the settings (I have set it to 4), and this doesn’t affect how videos are compared (only the similary thresholds affect that). You can also of course only compare files above a certain filesize etc.

  11. It’s a hard thing to do. The videos may look redundant but may be from different angles, clarity and such. I’s suggest pulling all you can get and then sorting by basics and then simply viewing the clips. I know it’s tedious, but you are more likely to lose something by having a computer algorithm sort for you.

    I’m not sure there will be a lot of footage after today. It sounded like the Israelis blew up the media building, stating that they, ” were hiding terrorists…” One media source said they were targeting all media institutions… but I have not looked much deeper. The whole thing is absolutely disgusting. I couldn’t believe the footage of them tear-gassing a Mosque.

  12. Download to a workstation. Take hash of downloaded file. Compare hash to list. If it’s already on the list delete to file, if not then send it to the prod drives.

    It means that the drive on a workstation is going to probably get worn the hell out, and if a file has even one bit flipped in it then you won’t delete it, but best to keep such things for posterity. Future researchers might be interested in even incredibly small changes like that, especially if they know where the file comes from.

    Keep absolutely every bit of metadata on the files that you can as well. Good luck man.

  13. I’d just find a way to scrape everything over the last 2 weeks relevant and then find all the video files and manually skim for duplicates by file size, duration, and their thumbnails maybe?

    You can expand further after by digging into prior incidents as it’s more likely new content will get lost rather than old content.

    What I’ve found/saved:

    https://old.reddit.com/r/PublicFreakout/comments/na7cd3/jewish_professor_calls_out_crocodile_tears/

    Irish MP reads out quotes made by Israeli Ministers in 2014 and 2015: https://www.youtube.com/watch?v=5utTDGS3B_Q

    Proving it’s become an Apartheid state: https://mirror.fro.wtf/reddit/post/3180033

    On the ground look: https://mirror.fro.wtf/reddit/post/3181262

    Also, skim reddit for posts. It can be useful to get the bigger threads and the comments following incidents as they often link to sources.

    https://old.reddit.com/r/PublicFreakout/comments/nce4cu/a_child_in_gaza_puts_whats_left_of_his_childhood/

    https://old.reddit.com/r/PublicFreakout/comments/nd6yu2/over_100k_peaceful_protestors_march_for_palestine/

    https://old.reddit.com/r/PraiseTheCameraMan/comments/n9yqmt/during_a_live_interview_there_was_sudden_missile/

    https://old.reddit.com/r/CatastrophicFailure/comments/na7jnu/palestinian_apartment_building_collapses_after/

    Although, that would probably be easier to do manually as automating that would take longer.

  14. “Saving” Izraeli-Palestine conflict, which IS the most complicated conflict of our lifetime and is spanning many many years is not possible. Just to show you the futility of what you want to do: archeological findings on Middle East show cities where there are 10 to 15 layers of archeological findings in really short distance, which means, that parts or whole cities were rebuilded 15 times in span of years or decades! That figure is crazy and really shows you that you can never fully capture this conflict in nothing reminding it’s entirety. Like I am not saying that it’s not a good goal, by all means, the more info we have the better but as a someone with bit of a knowledge in history I don’t see where are you going with it.

    Like you can capture current conflict, which is presumably heading towards a war, but what do you exactly want to capture? Articles? Videos? Those have exactly zero value for capturing any valuable info for research or general public. What you get is an opinions and death toll and “israel attacked 6 days in row”. Any valuable info is captured by inteligence services of Israel amd US and maybe EU, but you cannot access it and they have their files secured better than anyone, so once they release them 50 years in future you can have a better understanding of the situation, but until then? Save space for some other conflicts maybe? Atrocities are currently going on in Ethiopia and Hong Kong is still struggling. Also Taiwan-China and Greece-Turkey conflicts are important and going on right now, which are simpler, but again, no intelligence reports = zero value info.

    Edit: btw. You want to save video? From both sides? All of it? So you want to download the 24/7 stream of all the cameras in Israel – one of the most monitored countries in the world? Even if you tried to download ALL footage from the day they installed Iron Dome (2011), that’s 10 years or 87 658 hours of footage. From just one camera ar 1080p that is at least 3,6 GB/hour in 8bit [(according to this quora for youtube video quality)](https://www.quora.com/Videos-What-is-the-file-size-per-hour-of-recording-1080p-of-video) which is 315 568,8 GB per one camera per 10 years. Multiply that by what, 1000? 10000? Even with worse quality video you are looking at unimaginable amounts of footage. Just from one side of the conflict.

  15. As far as filtering goes, if you have something programmatic setup to ingest the data, you could try hashing everything you have saved now, and comparing the hash of new files to the set of hashes you already have. If you’re paranoid about hash collisions, you could use the filtetype and the file size as part of the comparison key as well.

    You might also look into bloom filters. They’re a neat data structure that can tell you whether something is “probably” in the set or “definitely not” in the set — If you have a lot of data, using a bloom filter might be more efficient.

  16. Shit I’ve got a couple tb I can spare. Lemme know if I can help.

  17. This was posted a few days ago, it is desigend to track down duplicate files

    Czkawka 3.1.0 – new version of my app to find duplicates, similar images, same music, broken files etc. from DataHoarder

    I haven’t used it personally it has been on my list of things to do, but I end up just browsing Reddit and accomplish nothing.

  18. Did something similar with the Beruit explosion footage. Here’s some thoughts I wrote after finishing it:

    “One of the biggest problems I ran into while searching for footage was reposting. Nobody on Twitter uses the retweet button like it is intended to be used, so browsing by Top and Trending leads to mainly the same pictures and videos being repeated many times. These viral clips do cycle out through the day, so checking back throughout the day is best. Instagram is similar but harder to search as it brings up more none relevant content in searches. The best strategy for finding content I discovered was to use the search feature to find relevant posts, then go to the poster’s profile to see if they had uploaded other content they found. Reddit was a good source for some footage but I didn’t find as much as sites that focused on individual posts more then a forum style. I had no success trying to use Facebook as the search feature was very lacking for this use case.

    Another source for images is reporters and news articles. Downloading from news articles will give you the best quality images, look for articles specifically centered around images. Stock image sites are also a great source but normally have lower resolution then news articles, as high resolution images will not be provided for free. Getty Images was the best source I found and provided the majority of images, but I wasn’t able to archive them in a very high resolution (roughly 1024×700). I used https://tomato.to/ for the ones I grabbed from Getty Images. If I ever did something like this again I’d probably try to automate that process as it took several hours to do that manually for hundreds of images.

    If you’re looking to get as much data as possible, don’t worry about duplicates. Download everything you see, even if you know you already have it. After you have finished downloading you can use programs to help you find duplicates, deleting the lower res duplicates and keeping the higher res ones. Awesome Duplicate Photo Finder was my favorite tool for this. Because duplicates are fine, this allows many people to collaborate on collecting data by combining their findings.”

  19. Don’t know if this helps but… There are iptv channels from Palestine and Israel running stuff 24/7. I don’t speak either language. .ts files can be recorded with tivimate for android. (Just to name one example)

  20. There is something called News API. Could be interesting for you. And about the duplicate videos: I think it’s very hard to automatically filter those. Because the news sites don’t just post the same unedited and uncutted mp4.

  21. I’ve seen hundreds of examples of images from Iraq, Afghanistan and Syria being posted as if they were part of the current Hamas/Israel conflict. Also unless you have links to someone involved in the conflict on the ground, all you can save is what they want you to see. With Israel and Palestine what they want you to see and what’s happening are two very different things