danthedataman.com

August 20, 2008

get back to blogging with fix for invalid XML

Filed under: SSIS — Dan @ 4:11 pm
Tags: , ,

Whew!  I can’t believe it’s been a month and a half since I last blogged.  And I hadn’t even posted any kind of data blog yet on WordPress, which is ironic given my blog url…… 

So here’s a little something to remedy both of those situations.

Trying to import an XML file with SQL Server 2005 Integration Services (SSIS), I encountered the following error: [XML Source [1]] Error: The component “XML Source” (1) was unable to process the XML data. ‘’, hexadecimal value 0x13, is an invalid character. Line 9634, position 40. 

There is no doubt a better, more deterministic way to solve this, and I may get to that eventually, but for now, here are some resources that enabled me to work around it and keep moving on my project.

Became aware of invalid XML characters, so the above error I received made more sense to me, from here: http://kbalertz.com/325694/Parsing-Containing-Invalid-Character-Raise-ArgumentException.aspx

http://www.asciitable.com/ — gives Hex & Octal ASCII values, which I needed, as well as Decimal value and character description.

Found my solution: http://www.unix.com/shell-programming-scripting/36352-replace-junk-chars-sed.html.  Since SSIS did not even seem able to handle the invalid XML, I had to deal with it before it got to SSIS.  Next logical place was on the Linux server where the XML files were being generated, where I did have some similar logic to resolve some similar issues.  I had already, using iconv, translated the file to UTF-8, but that obviously wasn’t helping.  Was using sed to replace dots with underscores, but did not find any documentation readily which talked about using ASCII code with sed.  The link at the top of this paragraph made me aware of tr, which I was able to use successfully to remove the invalid XML character.  Here is the command I used: tr -d ‘\23’ < INPUTFILE.xml > OUTPUTFILE.xml.  tr stands for ‘translate,’ it appears; the -d option deletes the indicated string from input.  I used http://asciitable.com (mentioned above) to convert the Hex 13 I had into the Octal 23 required for the tr command.

Well, that’s it for now.  Hope this is helpful for someone.

-Dan

Advertisements

2 Comments »

  1. Thanks! I found an even more useful option to tr:
    tr -d [:cntrl:] out.xml

    which deletes the control characters themselves.

    Comment by Neil — December 2, 2008 @ 2:47 pm | Reply

  2. […] There were a few Unidata (Colleague) files which contained character(s) in the data which SSIS did not like.  I blogged about that, along with what I did about it, here. […]

    Pingback by More: export XML from Unidata (and import into SQL Server) « danthedataman.com — March 23, 2009 @ 8:46 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: