um, sorry, folks, my fault, i wasn't clear in my request for fb2 challenge functions... i should have refreshed your memory on the exact nature of the challenge. if you'll remember, it was for routines that would examine an arbitrary text-file (which could be as big as 5 megabytes in size), and create a list of all the words used in it, with a count of how many times each was used. but never mind that, if you don't want, let me tell you what _i_ actually need, for _my_ purpose, which is (of course, as you know) within an e-book program. i need a list of all unique strings that are contained in the text, the delimiter being a space and/or a return (or multiples of 'em). the list _is_ sensitive to case, and leading/trailing punctuation. additionally, this list should be saved in a sorted order, with the sort also being case- and punctuation-sensitive. next step, the list should be split and saved across up to 245 str# resources, each holding no more than 245 items. (this gives a maximum list of unique strings right over 60,000.) also, of course, the original list should be stored in a str# resource as well (or in 2 if it's over _maxint items.). likewise with its sorted counterpart. each delimiter-separated string in the original text must also be replaced with a two-character pointer-token, where the first character points to one of the 245 str# resources, and the second points to a specific item within that str#. this tokenized text should then be saved in one or more str#s. so also humbly requested is a routine to do this tokenization. the routine could work _after_ the string-list generation process, or as code contained _within_ that function; either would be fine. (if it's done afterwards, we would want to put the _sorted_ version of the string-list into the str# resources, and use it. but if tokenization is done in the _process_ of string-collection, you'd only be able to tokenize based on the _unsorted_ version.) i should also mention that the tokenized file must maintain the returns of the original file, so the tokenized file will contain as many lines as the original. (i'm working with project gutenberg text-files, and the return at the end of each line is significant.) but spaces are _not_ maintained in the tokenized file. every string is expected to be followed by a single space, so that space is assumed and thus automatically generated. i've written routines that do all this stuff, but slowly, if you thought looking at it would help you out, but somehow i doubt a line-input type of approach would... :+) -bowerbird