Posted 1477 days ago
So since I'm bored, unemployed, and spending a lot of time goofin around with my websites, I thought it'd be fun to write a little RSS parsing utility so I could put together a toy RSS Aggregator site (yeah yeah yeah, it's been done, I know... but where's the learning experience in using someone else's code?). So what I've done is to write a class that extends org.xml.sax.helpers.DefaultHandler to be my callback handler for RSS2 XML streams. In this class, I use the characters method to handle text (as you're supposed to). It appears, however, that there are "special" characters in the XML stream that either break the input, or cause '?' characters to appear in my output.
By "special" characters in the XML stream, I mean character codes like ‘, which is supposed to be an opening single quotation mark. This appears as a '?' in the output. My first inclination is that it must have something to do with the character encoding of the input stream (in this case, "UTF-8"). Shouldn't the XML Parser be able to tell this though and handle it automatically? If not, how do I get around it? Perhaps it's not a character encoding issue at all -- does anyone have any other ideas?
Please leave comments!
UPDATE: Turns out the problem was not in the input, but rather in the output! My terminal shell was not set to recognize the UTF8 character set, and was therefore drawing '?' symbols for unrecognized character codes -- and the JSP I was using to test it wasn't setting the content-type properly, and my browser doesn't default to UTF-8 either. Thanks to those of you who got back to me, both in comments and via email.
add to
del.icio.us