XServer-less webpage screenshot

Posted on September 16th, 2008

You can follow any responses to this entry through the RSS 2.0 feed.

There’s no good solution for that which I know of, and in fact, it would probably require hacking up some well known renderer (Gecko or WebKit mainly).

The Mozembed only allow to display the web page (so you need a X server, there are several programs automating this, but its slow and not very elegant).

Well, I came up with my own solution, it’s far from perfect, yet much better. Using QT4′s WebKit renderer and the fake x server (Xvfb – bundled with Xorg) you can have a command line tool rendering pages perfectly!

Of course this can be converted to C++ easily or probably you can call the WebKit library directly – maybe i’ll try that in the future. (Or then again, hack QT not to request a X server connection, if that’s possible).

Run as (for example):

xvfb-run -a ./thumbpage.py 1024x768 http://www.insecure.ws/ insecure

If you need, you can uncomment the full page output (no scaling).

Script after the jump.

#!/usr/bin/python
# Copyright (c) kang@insecure.ws 2008
# Licensed under the terms of the GPLv3
# Require LibQT4 and PyQT4             
 
import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
 
app = QApplication(sys.argv) 
 
def usage():
        print "USAGE: "+sys.argv[0]+"   "
        print
        print "Render the URL  to a PNG thumbnail .png using WebKit"
        print "Example: "+sys.argv[0]+" 1024x768 http://www.insecure.ws/ insecure"
        sys.exit()                                                                    
 
if (len(sys.argv) < 4):
        usage()        
 
try:
        dim = sys.argv[1].split("x")
        if len(dim) != 2:
                usage()
except IndexError:
        usage()
 
url = sys.argv[2]
out = sys.argv[3]
 
class Thumbnailer(QObject):
        def __init__(self, url, out, dim):
                self.url = url
                self.out = out
                self.dim = dim
                self.web = QWebView()
                self.page = self.web.page()
                print "Loading", self.url, "please wait a moment"
                self.web.load(QUrl(self.url))
 
        def render(self):
                print "Rendering"
                frame = self.page.currentFrame()
                self.page.setViewportSize(frame.contentsSize())
                img = QImage(self.page.viewportSize(), QImage.Format_ARGB32)
                paint = QPainter(img)
                frame.render(paint)
                paint.end()
 
                thumb = QImage(img.scaled(int(self.dim[0]), int(self.dim[1])))
                #img.save(self.out+"_full.png")
                if thumb.save(self.out+".png"):
                        print "Saved to",self.out+".png"
                else:
                        print "Failed to save"
                app.quit()
 
t = Thumbnailer(url, out, dim)
t.connect(t.page, SIGNAL("loadFinished(bool)"), t.render)
sys.exit(app.exec_())

This guy made a nicer version of this script with xvfb wrapped in and checking for loading timeout, etc.

12 Responses to “XServer-less webpage screenshot”

  1. Stefano says:

    I tried this but “render” is never called.
    I also moved the load after the t.connect to make sure it was not a race in events, but again it does not work.

    I don’t know anything of python: is there a way I can debug what’s going on? I tried adding to __init__ a time.sleep and self.render() but I get a white image in output. From weblogs I see the webpage/css/images are indeed downloaded when I run the script… any idea?

  2. kang says:

    Either your framework does not have loadFinished(), then you need to update (but I doubt it), either loadFinished() is never called, which is likely.

    This may happen when a webpage never finishes loading for some reason. I’m not sure of the mechanism in place to prevent that, or if they need to be explicitly set (I didn’t lookup, might later if you need).

    By default i’d expect the loading to timeout with the tcp connection, so after a minute it should render, and if it got nothing it appears blank i suppose.

  3. Stefano says:

    The page load is completed (I have logs for all the requests in the webserver). I also tried changing the connect to a different even like load started / progress but no way to see a call.

    I use PyQt4-4.4.2, qt-4.4.1, qt-x11-4.4.1, sip-4.7.6, xorg-x11-server-Xvfb-1.4.99.901-29.20080415

    Can I ask you what versions did you use?

  4. Stefano says:

    I tested it on windows (Python 2.5.2, PyQT 4.4.3) and it worked out of the box.

    I did a few change:
    self.web.setGeometry(0, 0, 800, 600); just after QWebView instantiation to make the default webport 800×600 instead of 640×480.

    And changed thumb generation using the SmoothTransformation and cutting the result back to the requested size:
    thumb = QImage(img.scaled(int(self.dim[0]), int(self.dim[1]), Qt.KeepAspectRatioByExpanding, Qt.SmoothTransformation).copy(0,0,int(self.dim[0]), int(self.dim[1])))

    It’s way better now, but I have to understand why I can’t get it to work on my linux+Xvfb environment. Don’t even know where to start from :-(

  5. Stefano says:

    Maybe it is of interests of people on Fedora that fc9 packages you can find for PyQt4 (e.g: PyQt4-4.4.2-2.fc9.i386.rpm) do not have WebKit support and silently fail to run the script.

    Installing PyQt4-4.4.3-1.fc10.i386.rpm and sip-4.7.7-3.fc10.i386.rpm (even if they are fc10 they installed smoothly on my fc9) fixed my problem and let me build thumbnails!

    Thank you for the great post!

    Stefano

  6. Ravishankar says:

    Thanks for this very informative post.

  7. Cybso. » Blog Archive » Create screenshots of a web page using Python and QtWebKit says:

    [...] 2009-04-01 Here’s another guy who had the same idea earlier than me. Posted by Roland Filed in 1 Tags: english, HowTo, Linux, [...]

  8. XServer-less webpage screenshot - C++ | insecure says:

    [...] While a better solution would still be to bypass QT and render ‘by hand’ to and image file, or hack QT to do so (without requiring any X server connection, and using fewer dependency than large QT libraries…), I decided to run my script again, and I ran into troubles with python-qt4 on some debian systems (and older Fedora system, as a reader pointed out). [...]

  9. Podgląd strony do pliku jpg? : nme.pl says:

    [...] rozwiązanie w oparciu o Qt, warto [...]

  10. Reading ajax content programmatically » a Display of Patience says:

    [...] didn’t actually do anything. It just failed silently when I used it. Then I came across this: http://www.insecure.ws/2008/09/16/xserver-less-webpage-screenshot#comment-239. It seems version 4.4.2 of PyQt4 is sometimes build without support of WebKit. Guess which version [...]

  11. mariuz says:

    I have modified the #include this way (add the greater and less then signs)
    #include Qt
    #include QtGui
    #include QWebView
    #include QWebFrame

    and in wkthumb.pro i have added

    QT += network script webkit
    CONFIG += qt

    all worked ok
    the only issue i have now is with flash loading

  12. mariuz says:

    sorry for last post , it was intended for c++ article :)

Leave a Reply