XServer-less webpage screenshot
Posted on September 16th, 2008
You can follow any responses to this entry through the RSS 2.0 feed.There’s no good solution for that which I know of, and in fact, it would probably require hacking up some well known renderer (Gecko or WebKit mainly).
The Mozembed only allow to display the web page (so you need a X server, there are several programs automating this, but its slow and not very elegant).
Well, I came up with my own solution, it’s far from perfect, yet much better. Using QT4’s WebKit renderer and the fake x server (Xvfb – bundled with Xorg) you can have a command line tool rendering pages perfectly!
Of course this can be converted to C++ easily or probably you can call the WebKit library directly – maybe i’ll try that in the future. (Or then again, hack QT not to request a X server connection, if that’s possible).
Run as (for example):
xvfb-run -a ./thumbpage.py 1024x768 http://www.insecure.ws/ insecure
If you need, you can uncomment the full page output (no scaling).
Script after the jump.
#!/usr/bin/python # Copyright (c) kang@insecure.ws 2008 # Licensed under the terms of the GPLv3 # Require LibQT4 and PyQT4 import sys from PyQt4.QtCore import * from PyQt4.QtGui import * from PyQt4.QtWebKit import * app = QApplication(sys.argv) def usage(): print "USAGE: "+sys.argv[0]+" " print print "Render the URL to a PNG thumbnail .png using WebKit" print "Example: "+sys.argv[0]+" 1024x768 http://www.insecure.ws/ insecure" sys.exit() if (len(sys.argv) < 4): usage() try: dim = sys.argv[1].split("x") if len(dim) != 2: usage() except IndexError: usage() url = sys.argv[2] out = sys.argv[3] class Thumbnailer(QObject): def __init__(self, url, out, dim): self.url = url self.out = out self.dim = dim self.web = QWebView() self.page = self.web.page() print "Loading", self.url, "please wait a moment" self.web.load(QUrl(self.url)) def render(self): print "Rendering" frame = self.page.currentFrame() self.page.setViewportSize(frame.contentsSize()) img = QImage(self.page.viewportSize(), QImage.Format_ARGB32) paint = QPainter(img) frame.render(paint) paint.end() thumb = QImage(img.scaled(int(self.dim[0]), int(self.dim[1]))) #img.save(self.out+"_full.png") if thumb.save(self.out+".png"): print "Saved to",self.out+".png" else: print "Failed to save" app.quit() t = Thumbnailer(url, out, dim) t.connect(t.page, SIGNAL("loadFinished(bool)"), t.render) sys.exit(app.exec_())
This guy made a nicer version of this script with xvfb wrapped in and checking for loading timeout, etc.
October 6th, 2008 at 9:46 am
I tried this but “render” is never called.
I also moved the load after the t.connect to make sure it was not a race in events, but again it does not work.
I don’t know anything of python: is there a way I can debug what’s going on? I tried adding to __init__ a time.sleep and self.render() but I get a white image in output. From weblogs I see the webpage/css/images are indeed downloaded when I run the script… any idea?
October 7th, 2008 at 2:10 pm
Either your framework does not have loadFinished(), then you need to update (but I doubt it), either loadFinished() is never called, which is likely.
This may happen when a webpage never finishes loading for some reason. I’m not sure of the mechanism in place to prevent that, or if they need to be explicitly set (I didn’t lookup, might later if you need).
By default i’d expect the loading to timeout with the tcp connection, so after a minute it should render, and if it got nothing it appears blank i suppose.
October 7th, 2008 at 7:56 pm
The page load is completed (I have logs for all the requests in the webserver). I also tried changing the connect to a different even like load started / progress but no way to see a call.
I use PyQt4-4.4.2, qt-4.4.1, qt-x11-4.4.1, sip-4.7.6, xorg-x11-server-Xvfb-1.4.99.901-29.20080415
Can I ask you what versions did you use?
October 8th, 2008 at 9:24 pm
I tested it on windows (Python 2.5.2, PyQT 4.4.3) and it worked out of the box.
I did a few change:
self.web.setGeometry(0, 0, 800, 600); just after QWebView instantiation to make the default webport 800×600 instead of 640×480.
And changed thumb generation using the SmoothTransformation and cutting the result back to the requested size:
thumb = QImage(img.scaled(int(self.dim[0]), int(self.dim[1]), Qt.KeepAspectRatioByExpanding, Qt.SmoothTransformation).copy(0,0,int(self.dim[0]), int(self.dim[1])))
It’s way better now, but I have to understand why I can’t get it to work on my linux+Xvfb environment. Don’t even know where to start from
October 8th, 2008 at 10:12 pm
Maybe it is of interests of people on Fedora that fc9 packages you can find for PyQt4 (e.g: PyQt4-4.4.2-2.fc9.i386.rpm) do not have WebKit support and silently fail to run the script.
Installing PyQt4-4.4.3-1.fc10.i386.rpm and sip-4.7.7-3.fc10.i386.rpm (even if they are fc10 they installed smoothly on my fc9) fixed my problem and let me build thumbnails!
Thank you for the great post!
Stefano
November 14th, 2008 at 12:37 pm
Thanks for this very informative post.
April 1st, 2009 at 2:26 pm
[...] 2009-04-01 Here’s another guy who had the same idea earlier than me. Posted by Roland Filed in 1 Tags: english, HowTo, Linux, [...]
April 2nd, 2009 at 6:38 pm
[...] While a better solution would still be to bypass QT and render ‘by hand’ to and image file, or hack QT to do so (without requiring any X server connection, and using fewer dependency than large QT libraries…), I decided to run my script again, and I ran into troubles with python-qt4 on some debian systems (and older Fedora system, as a reader pointed out). [...]
August 25th, 2009 at 9:32 am
[...] rozwiązanie w oparciu o Qt, warto [...]
October 28th, 2009 at 11:44 pm
[...] didn’t actually do anything. It just failed silently when I used it. Then I came across this: http://www.insecure.ws/2008/09/16/xserver-less-webpage-screenshot#comment-239. It seems version 4.4.2 of PyQt4 is sometimes build without support of WebKit. Guess which version [...]
November 23rd, 2009 at 6:24 pm
I have modified the #include this way (add the greater and less then signs)
#include Qt
#include QtGui
#include QWebView
#include QWebFrame
and in wkthumb.pro i have added
QT += network script webkit
CONFIG += qt
all worked ok
the only issue i have now is with flash loading
November 23rd, 2009 at 6:43 pm
sorry for last post , it was intended for c++ article