[**tl;dr;** Viewable at https://romanport.com/p/cbsindex/viewer.html%5D __This has not downloaded any video, only URLs to videos__
A few days ago, out of curiosity, I looked at the API used on the website for one of my local TV stations, [WCCO](https://minnesota.cbslocal.com/video/category/news/). Normally, videos on that website disappear once they’re pushed past the 10th page (~2 weeks). However, I noticed that video IDs were stored (kind of) sequentially. I also found out that the server that handles the metadata for these IDs has no rate limit. You know where this is going…
I set up a program real quick to search through all video IDs and save valid IDs along with their metadata (in the .smil format) and HTTP headers. I was able to search these video IDs surprisingly quickly, at about 150/second. What I found was much more than I expected.
In total, I found metadata for **1,143,894** videos going back to 2015 published by **every major CBS affiliate TV station in the US**, of which I’d estimate **400,000** of which still have the videos accessible. It seems that the video files are removed from the server exactly two years after they’re published, but the metadata isn’t removed.
Obviously, I can’t download 400,000 videos. That’s what this subreddit is for though. I’m hoping someone will find this index useful. I think that having these clips stored safely in a public archive would be beneficial to archiving history, but I don’t have the disk space or the internet connection to do so.
## The index
I’ve built a simple web viewer to view videos that are likely still watchable. You can access it below, just keep in mind that it downloads a 70 MB JSON index file. https://romanport.com/p/cbsindex/viewer.html
* I’ve also uploaded the raw index as well in two formats. One is a smaller, human-readable, txt file that lists the ID, URL, and (http header) timestamp. It only has one quality level and is only really useful for browsing. [Download (59 MB gzipped, 231 ungzipped)](https://romanport.com/p/cbsindex/output.txt.gz)
* The other is a more advanced, binary, file containing the raw data I downloaded. The data is stored in the following custom binary format, repeated until EOF, gzipped. The SMIL content is the metadata downloaded from the server. It contains 3-4 URLs to various bitrates/quality levels. [Download (231 MB gzipped, 1,705 ungzipped)](https://romanport.com/p/cbsindex/output.bin.gz)
Binary format (old Reddit seems to break this if it’s following bullet points)
Name | Size | Offset | Info
Magic | 4 | 0 | “DATA” in ASCII
ID | 4 | 4 | The original ID it was requested with
Header Len | 2 | 8 | Length of the saved HTTP headers in-file
Content Len | 2 | 10 | Length of the SMIL data
Headers | ? | 12 | Raw HTTP headers from the request
Content | ? | ? | Raw SMIL file (XML)
Unfortunately, the date of these files is a bit hard to pin down. The HTTP “Last-Modified” header does contain a date, and so does the URL path, but these are often conflicting. I’ve also found files with dates in the future and dates close to the unix epoch. There’s likely another API that can be accessed to get this information though.
It appears that the video files are automatically removed from the server about two years after they’re published, so it’d be most important to download those first. This is just a guess after browsing through the files.
## How the index was built
While looking through the API requests made by WCCO’s website, I discovered that I can get metadata for video IDs at the URL “http://cbslocal-download.storage.googleapis.com/anv-videos/variant/<ID>.smil“. I also noticed that IDs are (kind of) sequential. What I mean by kind of is that there are gaps between valid IDs.
I took the ID of the latest video I could find, “5,540,889“, and just started counting backwards. The program I wrote could get 150/second, so I let it run overnight. When I next checked on it, it was at ID “891,880“ and stopped finding valid IDs. I stopped it at that point.
## Downloading content
As I said, this is just an index. I have not actually saved any videos. I just have URLs to videos.
I threw together a WinForms downloader that’ll download from a certain region/station within two dates. Wrote it in 25 minutes, might have bugs. Windows build [here](https://romanport.com/p/cbsindex/CbsDownloaderBin.zip), CSharp source [here](https://romanport.com/p/cbsindex/CbsDownloaderSrc.zip). When you first run it, it’ll download the 231 MB index file.