HTML escaping and unescaping are used to encode and decode special characters that have specific meanings in HTML, such as <, >, &, and ", so they can be safely included in HTML documents.
Here's how to handle HTML escape and unescape in both Java and .NET:
1. HTML Escape/Unescape in Java
In Java, you can use the StringEscapeUtils class from the Apache Commons Lang library for HTML escaping and unescaping. If you prefer not to use the library, I'll show you how to implement these methods manually as well.
Using Apache Commons Lang for HTML Escape/Unescape
First, add the Apache Commons Lang library to your project.
xml
<!-- In Maven (pom.xml) -->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.12.0</version> <!-- Use the latest version -->
</dependency>
HTML Escape Example (Using StringEscapeUtils)
java
import org.apache.commons.text.StringEscapeUtils;
public class HTMLEscapeExample {
public static void main(String[] args) {
String input = "<div>Hello & welcome!</div>";
String escaped = StringEscapeUtils.escapeHtml4(input);
System.out.println(escaped); // Output: <div>Hello & welcome!</div>
}
}
HTML Unescape Example (Using StringEscapeUtils)
java
import org.apache.commons.text.StringEscapeUtils;
public class HTMLUnescapeExample {
public static void main(String[] args) {
String escaped = "<div>Hello & welcome!</div>";
String unescaped = StringEscapeUtils.unescapeHtml4(escaped);
System.out.println(unescaped); // Output: <div>Hello & welcome!</div>
}
}
Manual HTML Escape and Unescape in Java
If you prefer not to use external libraries, you can write your own methods to escape and unescape HTML entities:
HTML Escape (Manual Method)
java
public class HTMLManualEscape {
public static String escapeHTML(String input) {
return input.replace("&", "&")
.replace("<", "<")
.replace(">", ">")
.replace("\"", """)
.replace("'", "'");
}
public static void main(String[] args) {
String input = "<div>Hello & welcome!</div>";
String escaped = escapeHTML(input);
System.out.println(escaped); // Output: <div>Hello & welcome!</div>
}
}
HTML Unescape (Manual Method)
java
public class HTMLManualUnescape {
public static String unescapeHTML(String input) {
return input.replace("&", "&")
.replace("<", "<")
.replace(">", ">")
.replace(""", "\"")
.replace("'", "'");
}
public static void main(String[] args) {
String escaped = "<div>Hello & welcome!</div>";
String unescaped = unescapeHTML(escaped);
System.out.println(unescaped); // Output: <div>Hello & welcome!</div>
}
}
2. HTML Escape/Unescape in .NET
In .NET, you can use HttpUtility (from the System.Web namespace) to escape and unescape HTML strings. If you're working with .NET Core, you'll need to use the System.Net.WebUtility class.
HTML Escape and Unescape in .NET (Using HttpUtility)
HTML Escape Example (Using HttpUtility)
csharp
using System;
using System.Web;
public class HTMLEscapeExample
{
public static void Main()
{
string input = "<div>Hello & welcome!</div>";
string escaped = HttpUtility.HtmlEncode(input);
Console.WriteLine(escaped); // Output: <div>Hello & welcome!</div>
}
}
HTML Unescape Example (Using HttpUtility)
csharp
using System;
using System.Web;
public class HTMLUnescapeExample
{
public static void Main()
{
string escaped = "<div>Hello & welcome!</div>";
string unescaped = HttpUtility.HtmlDecode(escaped);
Console.WriteLine(unescaped); // Output: <div>Hello & welcome!</div>
}
}
HTML Escape and Unescape in .NET Core (Using WebUtility)
In .NET Core, use System.Net.WebUtility for HTML escaping and unescaping.
HTML Escape Example (Using WebUtility in .NET Core)
csharp
using System;
using System.Net;
public class HTMLEscapeExample
{
public static void Main()
{
string input = "<div>Hello & welcome!</div>";
string escaped = WebUtility.HtmlEncode(input);
Console.WriteLine(escaped); // Output: <div>Hello & welcome!</div>
}
}
HTML Unescape Example (Using WebUtility in .NET Core)
csharp
using System;
using System.Net;
public class HTMLUnescapeExample
{
public static void Main()
{
string escaped = "<div>Hello & welcome!</div>";
string unescaped = WebUtility.HtmlDecode(escaped);
Console.WriteLine(unescaped); // Output: <div>Hello & welcome!</div>
}
}
Summary
HTML Escape: This process converts special characters (like <, >, &, ", and ') into HTML entities (like <, >, &, ", and '), making them safe to include in HTML content.
HTML Unescape: This reverses the escaping process, converting HTML entities back to their respective characters.
Best Practices:
Libraries: Use StringEscapeUtils (Java) or HttpUtility/WebUtility (in .NET) to automatically escape and unescape HTML strings. These libraries ensure that special characters are properly handled and help protect against security vulnerabilities like Cross-Site Scripting (XSS).
Manual Methods: While it's possible to manually escape and unescape HTML, it's more error-prone and not recommended when libraries are available. Use libraries for better maintainability and security.