User Tools

Site Tools


scrape:hidden_glasgow

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
scrape:hidden_glasgow [2020/11/20 17:38]
admin
scrape:hidden_glasgow [2020/11/20 18:10] (current)
admin
Line 35: Line 35:
 grep -vi -E "​(gif|jpg|png)$"​ http_links.txt > non-image-links.txt grep -vi -E "​(gif|jpg|png)$"​ http_links.txt > non-image-links.txt
 </​code>​ </​code>​
-22k links left.  Now I'll work through them, first I'll run them through a script that will find which domains no longer exists and remove them.  Then it'll search for 404s and move those into a seperate list.  ​+22k links left.  Now I'll work through them, first I'll run them through a [[Scraping/​DNS filter]] ​script that will find which domains no longer exists and remove them.  Then it'll search for 404s and move those into a seperate list.  ​
  
-Check out the [[Scraping]] page for some of the tools I use.+Check out the [[/Scraping]] page for more information.
scrape/hidden_glasgow.1605893931.txt.gz · Last modified: 2020/11/20 17:38 by admin