I am looking for a (preferably freeware) document filter API that covers a
wide range of document formats, e.g. HTML, Excel, word DOC, Access MDB, etc.
To clarify, I would get a set of libraries housing functions that look like
HANDLE FindFirstWord(char * szDocumentName) //opens a search
HANDLE FindNextWord(char * sWord, int nChars) //returns next content word -
i.e., with formatting stripped out - in the document
void Close(HANDLE h) //closes
To change docs that look like:
<FONT > courier</FONT><FONTSIZE>12</FONTSIZE>
<TEXT>The quick red fox etc.</TEXT>
Into a series like:
And it needs to work for a bunch of formats.
Verity has a commercial product - is there anything else out there?
I am on Windows BTW.