Find largest image in an html page

Recently ran into the problem of finding the largest image in an html page. This can be useful in page summarisation tasks for example.

Found this scala project which looked promising but failed to compile for me,
https://github.com/GravityLabs/goose

 

Here is a quick php script that does that.

https://gist.github.com/tilayealemu/2519f54222ad9a7d7ff2b294b2b32a09

Example call,

http://localhost/largest.php?url_without_http=bbc.co.uk

largest-image-php

Notes:

  • Pass url parameter without the “http://” part
  • The “is_suitable” function does additional checks such as images that seem to be banners
  • If you get bad results consider extending the function to suit your needs. For my personal use I have additional checks on the name and URL of the image.