Monday, February 28, 2005

If you have an object that you would like to serialize into xml as a string, you can do the following:

// returns a utf-16 string
XmlSerializer ser = new XmlSerializer(test.GetType());
StringWriter sw = new StringWriter();
ser.Serialize(sw, test);
ret = sw.ToString();

If you want to use the above code to return a utf-8 encoded string, you cannot because the StringWriter class does not allow you to set the encoding, the Encoding property is read-only. You might be able to use the XmlTextWriter to serialize the object in utf-8 encoding. One of the overloaded constructors takes a TextWriter as an argument, so I thought that I could create the XmlTextWriter, pass in the StringWriter (which derives from TextWriter), and then set the Encoding property on the constructed XmlTextWriter object. Unfortunately, an Encoding property does not exist on the XmlTextWriter object. The XmlTextWriter has two other constructors, one that takes a Stream and one that takes a filename. Both of the constructors also take an Encoding object as a parameter. Since I do not want to write to a file (I want to write to a string), I will use the constructor that takes a Stream, and pass in a MemoryStream object:

// returns a utf-8 string with a prepended character
MemoryStream memStrmWrite = new MemoryStream();
XmlSerializer ser = new XmlSerializer(test.GetType());
XmlTextWriter xtw = new XmlTextWriter(memStrmWrite, Encoding.UTF8);
ser.Serialize(xtw, test);
memStrmWrite = (MemoryStream)xtw.BaseStream;
UTF8Encoding enc = new UTF8Encoding();
ret = enc.GetString(memStrmWrite.ToArray());

This does return the xml in the correct encoding, but it also prepends the xml with an byte order mark character (which most XML 1.0+ processors should handle):

?<?xml version="1.0" encoding="utf-8"?>....

To get the proper encoded string without the leading extra character, read from the MemoryStream into a StreamReader, and call the StreamReader.ReadToEnd() method, remembering to reset the position of the MemoryStream to zero after the Serialize() method is executed:

MemoryStream memStrmWrite = new MemoryStream();
XmlSerializer ser = new XmlSerializer(test.GetType());
XmlTextWriter xtw = new XmlTextWriter(memStrmWrite, Encoding.UTF8);
ser.Serialize(xtw, test);
memStrmWrite = (MemoryStream)xtw.BaseStream;

// Keeps the byte order mark intact
if (_bKeepByteOrderMark == true)
{
UTF8Encoding enc = new UTF8Encoding();
ret = enc.GetString(memStrmWrite.ToArray());
}
else
{
memStrmWrite.Position = 0; // reset the position to 0
StreamReader sr = new StreamReader(memStrmWrite, Encoding.UTF8);
ret = sr.ReadToEnd();
}

This will remove the byte order mark at the beginning of the xml:

<?xml version="1.0" encoding="utf-8"?>....

Here are a few links on this:

http://www.eggheadcafe.com/articles/system.xml.xmlserialization.asp

http://weblogs.asp.net/rmclaws/archive/2003/07/31/22080.aspx

http://msdn.microsoft.com/msdnmag/issues/01/05/xml/

This page is powered by Blogger. Isn't yours?