Question about incorrect characters

Question about incorrect characters

Post by EA » Tue, 10 Feb 2004 03:31:08


I use ver 4.12a. Recently I sent an email in which a letter was
enclosed in quotation marks ("G"). The people who received the email
told me that instead of quotation marks they saw question marks (?G?).
I was puzzled by that because the content type of my message was

Content-type: text/plain; charset=ISO-8859-1

and I thought that makes it unlikely to have problems that sometimes
come up with html content or rich text. However, I also noticed that
the copy-to-self of the email I sent had

Content-transfer-encoding: 8BIT

in the headers, even though in the preferences I have not checked the
option to allow 8bit encoding.

The people who received the email use Netscape for email client. I'm
not sure what the problem is or what can be done to avoid it. Any
suggestions?

Emmanuel
 
 
 

Question about incorrect characters

Post by Toma » Tue, 10 Feb 2004 08:23:46

It not only in the folder copy to self the european letters becomes
funny.

Some of the filterrules doesn't work with European letters.


I don't think there are anything to do about it.

 
 
 

Question about incorrect characters

Post by EA » Tue, 10 Feb 2004 09:59:31


"Tomas" < XXXX@XXXXX.COM > typed in




I'm not sure I understand your reply. What european letters? I'm in
the US and the people who received the email were in the US too. The
problem was not in the copy to self folder (it was shown correctly
there) but in the email as the people who received it viewed it.

E.
 
 
 

Question about incorrect characters

Post by Robert Pyz » Tue, 10 Feb 2004 23:21:27


Tomas

I'm Poland users of PM. A have polish letters in 'copies to self"
folder. Polish users of PM change *.r files for code page ISO 8859-2
(our standard). We use this compiled *.r file.
http://www.yqcomputer.com/ ~sapi/pmail/konw_tab.zip
From this page (polish only, sorry)
http://www.yqcomputer.com/ ~sapi/pmail/pegasus_faq_pl
Maybe you can change files wpm-lmtt.r i wpm-char.r for your language.
P.S. At point 15 this WWW you have two e-mails, try ask they people for
non-compiled *.r files for ISO 8859-2, for sample for you.


--
Robert Pyzel r p y z e l @ g a z e t a . p l (remove BezSpamu )
 
 
 

Question about incorrect characters

Post by J. David B » Wed, 11 Feb 2004 02:59:57

On Sun, 08 Feb 2004 18:31:08 GMT in article
< XXXX@XXXXX.COM >, EA wrote...

Were they quotation marks or "typographic quotation marks" (a.k.a. "smart
quotes" or "curly quotes")?



If you sent typographic quotation marks, e.g., those produced on a Windows
system by entering ALT+0147 and ALT+0148 on the keyboard or by pasting from MS
Word, then this is quite possible, because...



...ISO-8859-1 does not have typographic quotation marks in its character
repertoire. See, for example:

http://www.yqcomputer.com/ ~jkorpela/chars.html#win

...which says, in part:

In ISO 8859-1, code positions 128 - 159 are explicitly reserved for control
purposes; they "correspond to bit combinations that do not represent graphic
characters". The so-called Windows character set (WinLatin1, or Windows code
page 1252, to be exact) uses some of those positions for printable
characters. Thus, the Windows character set is not identical with ISO
8859-1.

[...]

In the Windows character set, some positions in the range 128 - 159 are
assigned to printable characters, such as "smart quotes", em dash, en dash,
and trademark symbol. Thus, the character repertoire is larger than ISO
Latin 1. The use of octets in the range 128 - 159 in any data to be
processed by a program that expects ISO 8859-1 encoded data is an error
which might cause just anything. They might for example get ignored, or be
processed in a manner which looks meaningful, or be interpreted as control
characters.



I suspect that the Netscape e-mail client is treating the Content-type header
strictly and so is displaying question marks for the reserved ISO-8859-1
character encodings as an indication of error. Pegasus Mail, apparently, does
not, as it will both happily transmit messages containing those character
encodings but labelled ISO-8859-1, as well as display the Windows characters
assigned to those (reserved) positions. This behavior is consistent with the
last sentence quoted above, if less than ideal.



Two correct methods would be:

1. Send the correct "charset" parameter, i.e., the message header should be:

Content-type: text/plain; charset=windows-1252

...but I don't know whether PM can be coerced into doing this.

2. Substitute proper ISO-8859-1 characters for the "Windows characters",
e.g., change the "smart quotes" to regular quotation marks (ALT+0034).
I do this manually whenever I paste from a "smart quote" source.


Additional references:

http://www.yqcomputer.com/
http://www.yqcomputer.com/ ~jkorpela/latin1/1.html
--
-- Dave Bryan

NOTE: Due to unrelenting SPAM, I regret that I have been forced to post with
an encoded address. Please ROT13 this message to obtain the reply address.
Address also available at http://www.yqcomputer.com/ ~dbryan/mailto.html

Cyrnfr ercyl gb guvf nqqerff: XXXX@XXXXX.COM
 
 
 

Question about incorrect characters

Post by EA » Wed, 11 Feb 2004 11:03:02

"J. David Bryan" < XXXX@XXXXX.COM > typed in



...

Thank you for your informative answer Dave! It explains what happened
because the content of the email that I sent was pasted from another
application.... Thanks again!

Emmanuel
 
 
 

Question about incorrect characters

Post by J. David B » Thu, 12 Feb 2004 00:43:19

On Tue, 10 Feb 2004 02:03:02 GMT in article
< XXXX@XXXXX.COM >, EA wrote...

You're welcome.



Problematic characters that routinely appear in pasted content from other
Windows applications (particularly MS Office applications) are the left and
right single typographic quotations, the left and right double typographic
quotations, the middle dot, the en-dash, and the em-dash. Of course, all of
the characters in the range ALT+0128 through ALT+0159 are problematic, but
those are the ones I commonly encounter and have to recode for reliable e-mail
transmission using ISO-8859-1.
--
-- Dave Bryan

NOTE: Due to unrelenting SPAM, I regret that I have been forced to post with
an encoded address. Please ROT13 this message to obtain the reply address.
Address also available at http://www.yqcomputer.com/ ~dbryan/mailto.html

Cyrnfr ercyl gb guvf nqqerff: XXXX@XXXXX.COM
 
 
 

Question about incorrect characters

Post by EA » Thu, 12 Feb 2004 06:10:13

"J. David Bryan" < XXXX@XXXXX.COM > typed in



Yes, the problem was with the double quotations... I have another
question for you, if you don't mind:

I tried to specify windows-1252 for the character set but apparently PM
does not accept that. However, I tried sending an email from home (PM,
ISO-8859-1, MIME enabled) to my work address and, when I retrieved it
at work, the email headers said char set US-ASCII, 7 bit. Is that
equivalent to ISO-8859-1? Also, is it possible to avoid such problems
if I disable MIME?
Thank you for taking the time to reply...you have been very helpful!!

Emmanuel
 
 
 

Question about incorrect characters

Post by J. David B » Fri, 13 Feb 2004 01:11:34

n Tue, 10 Feb 2004 21:10:13 GMT in article
< XXXX@XXXXX.COM >, EA wrote...

Yes, I rather suspected that, based on the Pegasus Mail help file, which says
(for the "Default MIME character set" option), "Valid values are ISO8859-x,
where 'x' is a digit between 1 and 9."

PM does support something called the "Local MIME Translation Table," which is
referred to a bit later in the same help file: "If you have added a
comprehensive MIME character mapping table using an external resource file,
you can also enter the name of that character set here." It is contained in
the file WPM-LMTT.R in the "Resource" subdirectory of the PM executable
location. I've never investigated the use of this table, however, so I can't
offer advice beyond checking the newsgroup archives via:

http://groups.google.com/advanced_group_search

...for earlier discussion. You may be able to define a Windows-1252 table,
for example.



US-ASCII is the subset of ISO-8859-1 that contains character encodings from
0-127. All of the problematic characters are 128 and above, though, so a
message can't be US-ASCII if it contains any of those, e.g., contains the
typographic quotation marks.



No. The "Content-type" header is a MIME header, and without that, the message
is assumed to be "Content-type: text/plain; charset=us-ascii". So you're in
still the same boat. I'm afraid that you're stuck with the original choices:
label the message as "charset=windows-1252" to assign proper interpretation
to the problematic encodings, or do not use the Windows special encodings in
ISO-8859-1 or US-ASCII messages.

(The "charset" attribute is intended to indicate to the recipient how to
render the character encodings in the body of the message, e.g., to indicate
what character encoding number 146 "means." That will vary from charset to
charset; the same character encoding number may mean different things in
different charsets. A message that contains encodings that are deemed
reserved or not present in the specified charset simply make the message
erroneous, i.e., there is no valid interpretation. As noted in one of my
earlier citations, your recipient may or may not see the intended rendering --
an example might be sending "Windows encodings" improperly labelled as
ISO-8859-1 to a Macintosh e-mail client. However, simply setting the charset
does not produce legal character encodings...unless the e-mail client enforces
this. PM does not [unless, perhaps, via the LMTT mechanism]. So, in the end,
you must decide on how you wish to encode your message -- that is, what
characters you wish to include in the text -- and then select a character
encoding to match. Or, if your selections are limited, then you must choose
your encodings to match those limitations.)



You're welcome.
--
-- Dave Bryan

NOTE: Due to unrelenting SPAM, I regret that I have been forced to post with
an encoded address. Please ROT13 this message to obtain the reply address.
Address also available at http://www.bcpl.net/~dbryan/mailto.html

Cyrnfr ercyl gb guvf nqqerff: XXXX@XXXXX.COM

 
 
 

Question about incorrect characters

Post by RobertS » Fri, 13 Feb 2004 01:29:43

YusWb.10$ XXXX@XXXXX.COM ,
J. David Bryan < XXXX@XXXXX.COM >:


One could define it in the table, but would the recipient display it
correctly? He could, I guess, use it to send the message locally ;)

I think what could be tried, though, would be to modify the existing
ISO-8859-1 wpm-lmtt.r table by entering simple quotes for codes
corresponding to smart quotes. Once recompiled, I believe that would cause
smart quotes to get straightened out when sending the message, as long as
MIME is on, though I'm not absolutely sure of that.
And for display of incoming mail, too.

--
Robert
User in front of @ is fake, actual user is: mkarta
 
 
 

Question about incorrect characters

Post by J. David B » Fri, 13 Feb 2004 02:14:56

On Wed, 11 Feb 2004 17:29:43 +0100 in article <c0dl9n$8ov$ XXXX@XXXXX.COM >,
RobertSA wrote...

I suspect so, as windows-1252 is a properly registered charset:

http://www.yqcomputer.com/

...and is likely to be supported even on non-Windows systems.



I had thought of that as an alternate too. I don't think that the table will
translate single characters into multiple ones (e.g., an em-dash into "--"),
but it would appear to handle "smart quotes" transliterations, for instance.

I'm a bit surprised that PM, being pretty much a Windows-only application
these days, doesn't offer windows-1252 as a native choice, even if ISO-8859-1
is more widely supported.
--
-- Dave Bryan

NOTE: Due to unrelenting SPAM, I regret that I have been forced to post with
an encoded address. Please ROT13 this message to obtain the reply address.
Address also available at http://www.yqcomputer.com/ ~dbryan/mailto.html

Cyrnfr ercyl gb guvf nqqerff: XXXX@XXXXX.COM
 
 
 

Question about incorrect characters

Post by EA » Fri, 13 Feb 2004 02:52:55

"J. David Bryan" < XXXX@XXXXX.COM > typed in




Once again, thank you! I will experiment with the table but, like you
said, the best option when using pegasus is to avoid the windows
special encodings. I wonder, however, if there is a utility that would
transliterate such encodings when pasting from e.g. Word to another
application. Some time ago I was looking at clipboard managers and I
remember reading something similar. Unfortunately, it has been a while
and I do not recall the name of the program but if I find it, I will
post a message here...

Emmanuel
 
 
 

Question about incorrect characters

Post by RobertS » Fri, 13 Feb 2004 04:37:51

mqtWb.12$ XXXX@XXXXX.COM ,
J. David Bryan < XXXX@XXXXX.COM >:


I stand corrected. It's just that I don't see much mail flying around with
this charset definition.


I have edited and compiled the table to handle Polish national characters
and I don't see a way of converting two-character sequences into single
ones, or else I'd have coded in UTF-support. I sometimes (rarely,
fortunately) get UTF-mail in Polish, and that is extremely hard to read.
This is one of two weaknesses of Pegasus versus OE. The other is that it
doesn't recognize digital signatures (I get digitally signed documents from
one of my banks).

--
Robert
User in front of @ is fake, actual user is: mkarta
 
 
 

Question about incorrect characters

Post by Claus Rug » Fri, 13 Feb 2004 07:45:38


But character sets such as ISO8859-10 and above are working too. You can
find for example an ISO8859-15 translation table in the wpm-lmtt.r file
which comes with the latest Pegasus Mail distribution. However, this
table has some errors and I decided to use my own (I sometimes need the
translation to properly encode the Euro sign).


Claus
--
Don't use the "From: " for e-mail replies, it's nothing but a waste basket.
Use the "Reply-To: " instead. However, a message important enough to be sent
by mail should be important enough to be encrypted with PGP or GnuPG (look
in the headers). Otherwise you might not be recognized and won't get a reply.
 
 
 

Question about incorrect characters

Post by RobertS » Fri, 13 Feb 2004 15:32:43


XXXX@XXXXX.COM ,
EA < XXXX@XXXXX.COM >:


You don't have to use a special utility that far.
In Pegasus 4.12a, Paste special options 2-4 will do that (Ctrl-Shift-V).

(In much older versions, I have used Paste unwrapped or Paste as quote for
that.)

Also, if you go to Outgoing Mail, Message Formatting, you will find there
two options, either of which will also cause that to always happen for the
regular paste operation. No prizes for guessing which options ;)

Finally, if you uncheck "Rich text" (Alt-t) just before sending that should
work too (though I haven't tried this particular one).

--
Robert
User in front of @ is fake, actual user is: mkarta