Thomas’ Developer Blog

January 17, 2010

Whitelist HTML Tags (Advance Methods for Prevention against Javascript Injections)

NOTE: This method is no longer preferred. Please see Microsoft Anti-Cross Site Scripting Library V4.2
http://www.microsoft.com/download/en/details.aspx?id=28589 

Long time no update! I’m shocked to see I’m still getting over 100 posts a day considering I haven’t updated in months.

Well I wrote a little script to help everyone out who is using the HTMLeditor that ships with asp.net’s AJAX Control Toolkit. Hope you enjoy!

Function HTMLStream(ByVal InputValue As String, Optional ByVal WhiteList As String = "p|span|ol|li|ul|hr|div|i|b|h1|h2|h3|h4|a|br|img|font") As String
Dim ReturnValue As String
ReturnValue = Regex.Replace(InputValue, "<(?!(" & WhiteList & ")\b)[^>]+>([^.]|[.])*(<(?!/?(" & WhiteList & ")\b)[^>]+>)", "", RegexOptions.IgnoreCase)
While (Regex.IsMatch(ReturnValue, "(<[\s\S]*?) on.*?\=(['""])[\s\S]*?\2([\s\S]*?>)", RegexOptions.Compiled Or RegexOptions.IgnoreCase))
ReturnValue = Regex.Replace(ReturnValue, "(<[\s\S]*?) on.*?\=(['""])[\s\S]*?\2([\s\S]*?>)", _
Function(match As Match) [String].Concat(match.Groups(1).Value, match.Groups(3).Value), RegexOptions.Compiled Or RegexOptions.IgnoreCase)
End While
ReturnValue = Regex.Replace(ReturnValue, "(?<=<.*)href=""(?!http://|www\.)[^""]*""", "", RegexOptions.IgnoreCase)
Return ReturnValue
End Function

Now if you want to know how this script works you can continue reading. As a warning I will be assuming that you know regex and intermediate VB.Net code (If you want C# there are a lot of conversion applications online.)

Part 1
The function starts off with two variables. InputValue, which is self described, and the optional WhiteList. WhiteList is a list of HTML characters which will be accepted. By default it’s pretty generous.

Part 2
ReturnValue = Regex.Replace(InputValue, “<(?!(” & WhiteList & “)\b)[^>]+>([^.]|[.])*(<(?!/?(” & WhiteList & “)\b)[^>]+>)”, “”, RegexOptions.IgnoreCase)

This line searches every HTML tag and checks to see if it matches any of the values in the WhiteList group. If it doesn’t it clears out the tag and ALL of it’s contents. This is setup to be greedy! Why greedy? Because it’s for security! I don’t want to remove just the tag, I want to remove EVERYTHING inside of the tag. So be WARNED, altering the WhiteList tags may result in lost of user input.

Part 3

While (Regex.IsMatch(ReturnValue, "(<[\s\S]*?) on.*?\=(['""])[\s\S]*?\2([\s\S]*?>)", RegexOptions.Compiled Or RegexOptions.IgnoreCase))
ReturnValue = Regex.Replace(ReturnValue, "(<[\s\S]*?) on.*?\=(['""])[\s\S]*?\2([\s\S]*?>)", _
Function(match As Match) [String].Concat(match.Groups(1).Value, match.Groups(3).Value), RegexOptions.Compiled Or RegexOptions.IgnoreCase)
End While

This next part is a bit confusing. Generally this goes the extra step most scripts don’t bother to do. Which is a shame since it fails to remove those pesky JavaScript event handlers.

Part 4
ReturnValue = Regex.Replace(ReturnValue, “(?<=<.*)href=””(?!http://|www\.)[^""]*”””, “”, RegexOptions.IgnoreCase)

The final part is to go through and remove all javascript injections using the href objection for anchor tags. This will only allow links starting with “www.” or “http://&#8221;. You can modify this if you want to allow others such as ftp etc. Obviously this is to prevent against those href=”javascript:…..” injections.

So now that you got the basics you can go through and figure out the nitty gritty! Remember as one developer wrote in a blurb, DO NOT ever let the attacker no if they failed or passed. Otherwise you’re basically inviting them to try to figure out your code. You don’t want to do that!

Please read:
While I put a great deal of effort into this script, I did not write everything from scratch. A lot of people around the web have helped write the code you see above. I simply tweaked what they had and combined it into a far more secure function. So thanks to everyone who posted the original code that helped me write this. Sadly there are too many to know off hand.

About these ads

6 Comments »

  1. So I didnt know anything about javascript injection until recently when someone used this technique to affect our site. Now im doing research on this topic an came across your posting. Im familiar with regx and understand what’s happening in your function above. What I dont understand is how do your use this function. When does it get called? etc. Thx

    Comment by Nick A — February 2, 2010 @ 6:06 pm

    • Nvm I just noticed that you intially wrote “a little script to help everyone out who is using the HTMLeditor that ships with asp.net’s AJAX Control Toolkit.”
      – my bad

      Comment by Nick A — February 2, 2010 @ 6:08 pm

  2. This does not really matter what control the value is coming from the function accepts a string that it will attempt to sanitize and returns the clean string.

    You would use this in a code behind on a vb.net site in it’s current form.

    Comment by Anonymous — April 8, 2010 @ 2:18 am

  3. Hello colleagues, how is the whole thing, and what you would like to
    say about this post, in my view its in fact amazing for me.

    Comment by how to configure arris best cable modem for comcast — August 15, 2014 @ 6:13 pm

  4. What’s Going down i’m new to this, I stumbled upon this I have
    found It absolutely helpful and it has aided me out loads.
    I hope to contribute & assist different customers like
    its helped me. Great job.

    Comment by pregnancy belly expansion comic strips — September 4, 2014 @ 11:54 pm

  5. I just like the valuable information you supply on your articles.

    I’ll bookmark your weblog and take a look at again here regularly.

    I’m somewhat sure I’ll learn many new stuff right right here!
    Good luck for the next!

    Comment by how to configure best best how to connect xbox 360 — September 8, 2014 @ 5:53 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Silver is the New Black Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: