search & replace with octal notation

search & replace with octal notation

Post by Haines Bro » Sun, 04 Jan 2004 23:45:16


For example, in an (otherwise decimal notation) document that I'm
editing, I seek to search/replace \302\223 and \203\224\ with <q> and
</q> respectively.

It doesn't work. When I paste "\302\223" after the "Replace string"
prompt in the minibuffer, and then press <RET>, it converts itself to
"\201\302\236\263". Then I find that I can't type in the"<q>"
characters properly after the "with" prompt. What I type comes out the
wrong characters.

--
Haines Brown
XXXX@XXXXX.COM
XXXX@XXXXX.COM
www.hartford-hwp.com
 
 
 

search & replace with octal notation

Post by David Kast » Sun, 04 Jan 2004 23:58:20

Haines Brown < XXXX@XXXXX.COM > writes:


Your version info is less than complete. For all you are telling,
you could be using vi. Try generating your reports with
M-x report-emacs-bug RET
which will include relevant version information.

--
David Kastrup, Kriemhildstr. 15, 44793 Bochum

 
 
 

search & replace with octal notation

Post by Haines Bro » Mon, 05 Jan 2004 01:12:51

David Kastrup < XXXX@XXXXX.COM > writes:


David, sorry about that: emacs 21.2.1. I didn't think I was
experiencing a bug, but merely need to know how to carry out a simple
little search/paste operation involving octal notation. But sent it
off as a bug, anyway, as you suggest.

--
Haines Brown
XXXX@XXXXX.COM
XXXX@XXXXX.COM
www.hartford-hwp.com
 
 
 

search & replace with octal notation

Post by Kai Grossj » Mon, 05 Jan 2004 02:07:42

Haines Brown < XXXX@XXXXX.COM > writes:


That's strange. What happens if you use C-q 3 0 2 RET to enter \302
and likewise for the other backslashed thingies?

Kai
 
 
 

search & replace with octal notation

Post by Haines Bro » Mon, 05 Jan 2004 03:51:21

Kai Grossjohann < XXXX@XXXXX.COM > writes:


Here's what happens: for 302 I get ? for 223, I get ?(that, is,
an uppercase A circonflex for the first, and an octal 223 reproduced
for the second). I'm not getting a conversion of the octal.

However, the plot thickens.

1. I went to manually remove and replace the octals, and found I
couldn't write the characters neede. For example, I have in the
document, "Fran\347ois Duvalier" (not sure if this will get through
but the string consists of an octal 347 in lieu of the c-cedila).

2. I deleted the octal 347 and went to insert the c-cedila, but that
simply recreated the octal! Instead of a ?I get \347. Now, this is
really interesting: I'm running emacs split screen and can paste a
c-cedila into this message, but not into the text I'm editing in
the other screen! That suggests something fishy about the text
itself rather than emacs.

3. If I copy and paste the string from the document having the octal
notation for c-cedila into my gnus message, here's what I get:
Franis Duvalier. The octal notation is understood in gnus.

4. However, if I create in emacs an empty file and paste the octals
into it instead of into this gnus message, the octals are not
understood. For example, "I have seldom met a ?blan? or ?n
milat? who speaks of Dessalines rationally,..." It seems gnus can
handle the octals, but a simple text file is at a loss. The
A-circonflex question mark here should be simply an open double
quotation mark at the beginning of the word "blan". Curiously,
though, the e-grave in "n" was handled OK (in the original
message it is octal 350).

5. The original text is simply an e-mail message I received in
2002. The header says: X-MIME-Autoconverted: from QUOTED-PRINTABLE
to 8bit by whitman.webster.edu id g9TMwb2a047563. I'm not sure what
this implies, but apparently the document's character notation
causes emacs to gag.

(Two days ago was the 200th anniversary of the Haitian Revolution, and
hence I have to worry about Duvalier, the "blancs", and the "n
milat").

--
Haines Brown
XXXX@XXXXX.COM
XXXX@XXXXX.COM
www.hartford-hwp.com
 
 
 

search & replace with octal notation

Post by David Kast » Mon, 05 Jan 2004 04:08:30


I read in your headers that you are using 21.2 still. You are aware
that 21.3 is a pure bugfix release that, among others, deals with
international characters and copy&paste problems?

--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
 
 
 

search & replace with octal notation

Post by Kai Grossj » Mon, 05 Jan 2004 07:15:53

Haines Brown < XXXX@XXXXX.COM > writes:


Hm.

What does Emacs say if you C-u C-x = on a c-cedilla that looks right,
and what does it say on a \347?

Kai
 
 
 

search & replace with octal notation

Post by Haines Bro » Mon, 05 Jan 2004 11:05:51

Kai Grossjohann < XXXX@XXXXX.COM > writes:


I'm having no trouble inserting a c-cedilla in a text file or message
(), but it seems the particular message file I'm trying to edit is
the culprit. Perhaps it is 16-bit?


character: (04347, 2279, 0x8e7)
charset: latin-iso8859-1 (Right-Hand Part of Latin Alphabet 1
(ISO/IEC 8859-1): ISO-IR-100)
code point: 103
syntax: word
category: l:Latin
buffer code: 0x81 0xE7
file code: 0x81 0xE7 (encoded by coding system emacs-mule-unix)
font:
-B&H-LucidaTypewriter-Bold-R-Normal-Sans-12-120-75-75-M-70-ISO8859-1

This, of course, is based on the c-cedilla I inserted above in this
message. My problem may be, as I suggested, with the original file
itself. So let me turn to it.

I find that if I select the entire \347, the C-u C-x = simply reports
the following letter (o), but if I simply place the insertion point on
the escape back slash, I get:

character: (0347, 231, 0xe7)
charset: eight-bit-graphic (8-bit graphic char (0xA0..0xFF))
code point: 231
syntax: whitespace
category:
buffer code: 0xE7
file code: 0xE7 (encoded by coding system raw-text-unix)
font:
-B&H-LucidaTypewriter-Bold-R-Normal-Sans-12-120-75-75-M-70-ISO8859-1

What's an "eight-bit-graphic character"? This is all very interesting,
but I've no idea what it means. I'm having trouble finding C-u C-x = in
emacs Info.

--
Haines Brown
XXXX@XXXXX.COM
XXXX@XXXXX.COM
www.hartford-hwp.com
 
 
 

search & replace with octal notation

Post by Kai Grossj » Mon, 05 Jan 2004 22:44:49

Haines Brown < XXXX@XXXXX.COM > writes:


When Emacs sees a random non-ascii byte and doesn't know which
character set it might be from, then it declares that byte as
eight-bit-graphic. For example, if you read a file as latin-1 that
contains a byte that isn't part of the latin-1 encoding, then that
byte will be declared to be eight-bit-graphic.

So Emacs is telling you: I have seen a byte in the input, but I don't
know which charset it belongs to.


Look for C-x =. The C-u is just a prefix arg. For example, C-f is
mentioned in the manual, and C-u C-f means to do C-f four times. (C-u
C-u C-f means 16 times, etc.) In the case of C-x =, the prefix arg
doesn't mean "four times", it just means "print more information".

Kai
 
 
 

search & replace with octal notation

Post by Haines Bro » Mon, 05 Jan 2004 23:50:19

Kai Grossjohann < XXXX@XXXXX.COM > writes:


Thanks, that is helpful, but does not take me far enough. Is the
character set of a document defined in a document attribute or is it
inferred from a scan of the document's characters?

If the former, how does one ascertain a documement's character set
by displaying that attribute?

If the latter, can I infer that my problematic document either has a
character set unknown to my emacs setup or has characters from more
than one character set? In either case, what does one do about it?

...

I got the "C-x =" OK, but still find C-u obscure. In Info it is
described as a "universal argument" that allows what follows it to
serve as an argument. But what then is "C-x =" an argument for? Or is
"=" the argument for "C-x"? Or am I complicating things too much, and
the C-u merely says to print the result of the command that follows?

--
Haines Brown
XXXX@XXXXX.COM
XXXX@XXXXX.COM
www.hartford-hwp.com
 
 
 

search & replace with octal notation

Post by rydis » Tue, 06 Jan 2004 00:24:30

Haines Brown < XXXX@XXXXX.COM > writes:

It can be both ways. If you don't have anything in the document that
specifies the character encoding, heuristics are used.


In Emacs, at least, that's plainly visible, using file variables.
(Text specifying the values of some variables; either on the first
line or the last few lines.)


You can see what Emacs thinks about your current coding system, with
C-h C , and you can use C-x RET f to change what external
representation should be used.


C-u is used differently in different functions; also, it has a few
convenience usages that might confuse. For instance, the only thing
C-x = cares about is whether the argument to give more detail is
present (and true) or not. C-u <n> is used to give numerical
arguments. For convenience, C-u /without/ an <n> represents 4, C-u C-u
represents 16, and so on. So, the command called by C-u C-x = is
actually (what-cursor-position 4). Since the optional argument DETAIL
is 4, a true value, additional detail is given.

Regards,

'mr

--
[Emacs] is written in Lisp, which is the only computer language that is
beautiful. -- Neal Stephenson, _In the Beginning was the Command Line_
 
 
 

search & replace with octal notation

Post by Reiner Ste » Tue, 06 Jan 2004 00:31:49


,----[ From the Tutorial (see `C-h t' oder help menu) ]
| Most Emacs commands accept a numeric argument; for most commands, this
| serves as a repeat-count. The way you give a command a repeat count
| is by typing C-u and then the digits before you type the command. If
| you have a META (or EDIT or ALT) key, there is another, alternative way
| to enter a numeric argument: type the digits while holding down the
| META key. We recommend learning the C-u method because it works on
| any terminal. The numeric argument is also called a "prefix argument",
| because you type the argument before the command it applies to.
`----

The tutorial probably is something that every Emacs user should read.

In (info "(emacs)Arguments"), which is the first match if you do an
index search for `C-u', you'll probably find answers to all your
questions raised above.

Bye, Reiner.
--
,,,
(o o)
---ooO-(_)-Ooo--- PGP key available via WWW http://www.yqcomputer.com/
 
 
 

search & replace with octal notation

Post by Kai Grossj » Tue, 06 Jan 2004 00:43:22

Haines Brown < XXXX@XXXXX.COM > writes:


In addition to what Martin said, I'd like to point out that you can
put strings like "-*- coding: iso-8859-1; -*-" (without quotes) into
the first line of a file to specify the encoding used in the file.

If such a string is not present, then it depends on your "language
environment" how Emacs guesses what the contents are. For example, in
a Latin-1 language environment, Emacs will assume Latin-1 in most
cases, except when it it obvious that the file is in another encoding.


In addition to the string in the first line, there can also be a Local
Variables section near the end of the file.


It could also be that the document is just broken. The solution is to
repair it. For example, using C-u C-s [ ^ C-q 0 0 0 RET - C-q 1 7 7
RET ] you can search for a non-ascii character in the file. Then you
can think about what that character should be, and just replace it by
the "proper" character.

Or Emacs was reading the document in a certain context that was
wrong. For example, saying C-x C-m c latin-1 RET C-x C-f /some/file
RET will cause Emacs to read /some/file in the Latin-1 encoding. This
way, you can force the encoding that Emacs should use.


C-u is an argument for what follows it. So C-u is the argument, and
C-x = is the command.

If you type C-u C-f then C-f gets the argument 4. If you type C-u C-u
C-f then C-f gets the argument 16. If you type C-u 4 2 C-f, then C-f
gets the argument 42. Clear(er)?

Kai