2 fields have embedded \n's, how to get rid of these?

2 fields have embedded \n's, how to get rid of these?

Post by chilecayen » Thu, 28 Oct 2004 04:04:52


Hello all,

I'm working on a csv file I got from excel (I hate these). I've used ~
(tilde) as a delimiter. In a couple of fields...they were free form
text...they have some embedded \n's that seem to be throwing things
off.

I know which fields to watch this for...let's say $5 and $10

The \n's seem to be throwing things off on simple print statements.
I've been looking over USENET and websites through Google...and
haven't found a good answer for cleaning this up. I'd like to either
take out the newlines in that field...or maybe just substitute
'CHR(13)' I think it is...as the text that will be used to put this
into an Oracle insert statement.

I've use sed in the past to fix join lines...but, this mangles the
whole file, not just the field (sed ':a; /.*$/N; s/\n/,/; ta')

I was thinking of maybe running this from within awk on $5..using the
getline function...but, can't figure how to get that going.

Can I convert the \n's within a field in awk...if not, is there a good
way with sed or another tool to fix it in the field before running it
through awk.

Thanks in advance...I've seen many posts similar to this...but, not
finding a good solution....

chilecayenne
 
 
 

2 fields have embedded \n's, how to get rid of these?

Post by gazell » Thu, 28 Oct 2004 04:30:22

In article < XXXX@XXXXX.COM >,


The "CSV problem" comes up frequently here. Various people have come up
with various solutions, ranging from the simple to the bizarrely complex.

Have you tried setting RS=~ ? That should make each field come in as
a separate line (record), regardless of any embedded newlines.