New: Link: Caja's HTML sanitizer for Javascript

[Edited to add: If you have questions or concerns about Caja, the Google Caja Discuss group is a good place to ask them.]

When you write a program that's supposed to be secure, you have to plan on security from the beginning; you can't bolt it on afterwards. The idiomatic way to describe a "plan" like we'll write the program first and figure out the security later is "They're asking for some magical security fairy dust to sprinkle over their code."

I'm tweaking a Javascript program that takes HTML from someone else and renders it on a page. I thought my program was getting "sanitized" HTML; that is, HTML that had any potentially-dangerous stuff removed. If I'm showing someone else's HTML on my page, I want to make sure that HTML doesn't have, for example, an <img src="http://sneaky.org/sneaky.gif"> in it. Otherwise, the webmaster of sneaky.org will know whenever someone reads my page.

I thought the program was getting sanitized HTML, but it was getting "raw" HTML, possibly chock-full of evil. Argh, I needed to bolt on some security. I went pleading to some of the security-minded folks for help. I was embarrassed--I 'fessed up that I needed some "magical security fairy dust". The amazing part is that those security-minded folks came through--they pointed me at Caja.

Caja is primarily a system for enforcing security "capabilities" in Javascript. But, but but even if you don't need all of that, you might still want one part:

Caja comes with a XSS sanitizer for HTML that works with your JS code: html-sanitizer.js. And you'll also need html4-defs.js. It looks like you need to build html4-defs.js via Ant. That's kinda annoying, but a lot easier than writing your own HTML sanitizer from scratch.

I looked over the source code. It's checking for bad stuff I hadn't thought to check for. I sure am glad that folks more knowledgeable than me are working on this thing.

Labels: , ,

Posted 2008-12-19

 Jasvir Nagra said...

Hi Larry! Great use of Caja! :) We don't have a "1.0" release of Caja but we do push updates at http://google-caja.googlecode.com/svn/maven/caja/caja/*/caja-*.jar. This jar already contains html4-defs.js pre-built so removes the dependence on ant.

Plus, we also include html-sanitizer-minified.js in that jar. It is html4-defs.js, css-defs.js and html-sanitizer.js concatenated and minified - a single file with the same functionality but with fewer calories and smaller so will download quicker when used.


Regards
Your Friendly Neighborhood Cajadore

23 December, 2008 20:11
 RichB said...

In addition to html4-def.js in the JAR, it used to live in the svn repos:

http://code.google.com/p/google-caja/source/browse/trunk/src/com/google/caja/plugin/html4-defs.js?spec=svn2833&r=2466

29 April, 2009 07:38