danthedataman.com

August 20, 2008

get back to blogging with fix for invalid XML

Filed under: SSIS — Dan @ 4:11 pm
Tags: , ,

Whew!  I can’t believe it’s been a month and a half since I last blogged.  And I hadn’t even posted any kind of data blog yet on WordPress, which is ironic given my blog url…… 

So here’s a little something to remedy both of those situations.

Trying to import an XML file with SQL Server 2005 Integration Services (SSIS), I encountered the following error: [XML Source [1]] Error: The component “XML Source” (1) was unable to process the XML data. ‘’, hexadecimal value 0x13, is an invalid character. Line 9634, position 40. 

There is no doubt a better, more deterministic way to solve this, and I may get to that eventually, but for now, here are some resources that enabled me to work around it and keep moving on my project.

Became aware of invalid XML characters, so the above error I received made more sense to me, from here: http://kbalertz.com/325694/Parsing-Containing-Invalid-Character-Raise-ArgumentException.aspx

http://www.asciitable.com/ — gives Hex & Octal ASCII values, which I needed, as well as Decimal value and character description.

Found my solution: http://www.unix.com/shell-programming-scripting/36352-replace-junk-chars-sed.html.  Since SSIS did not even seem able to handle the invalid XML, I had to deal with it before it got to SSIS.  Next logical place was on the Linux server where the XML files were being generated, where I did have some similar logic to resolve some similar issues.  I had already, using iconv, translated the file to UTF-8, but that obviously wasn’t helping.  Was using sed to replace dots with underscores, but did not find any documentation readily which talked about using ASCII code with sed.  The link at the top of this paragraph made me aware of tr, which I was able to use successfully to remove the invalid XML character.  Here is the command I used: tr -d ‘\23’ < INPUTFILE.xml > OUTPUTFILE.xml.  tr stands for ‘translate,’ it appears; the -d option deletes the indicated string from input.  I used http://asciitable.com (mentioned above) to convert the Hex 13 I had into the Octal 23 required for the tr command.

Well, that’s it for now.  Hope this is helpful for someone.

-Dan

Advertisement

Create a free website or blog at WordPress.com.