Steve Carter on Software Development

Software Development weblog. Mostly about my THoTH Music Learning Software.

Saturday, December 04, 2004

Indexing Player's Journal

How to archive and index Player's Journal. First, standardize the entry dates to dd/mm/yyyy. That will facilitate automated searching and indexing. For the initial pass at archivvng I'll manually extract sections by year. Create one HTML file for each year, leaving the current year in the main page. At the top of main page:
Archives: 2002 2003 2004 Index
with each year a hyperlink to that year's archive. Index will be a link to the index page.
Preparation for indexing. Create 2 CSS classes: entrydate and entrytitle. Then I can search for those, they mark the beginning of an entry. hr tag can mark the end of an entry. Currently I can have mulitiple entry titles withtin a date. I might need to change that to make indexing and linking easier
Indexing. Rejected idea: make page a form and insert hidden fields with keywords. Do I want to hide the keywords? More readable if I do, easier to edit if I don't. There will be topics for entries where the keyword of the topic does not appear in the text. Example: an entry about Bill Laevitt. Topics: William G. Leavitt; music education; Berklee. Idea: a line or more of keywords, maybe grey and in 7-point type so unobtrusive, at the bottom of an entry. Then I could write a Perl script to create the index. But where do I put the anchors? On the keywords, I guess. But then the link in the index takes you to the bottom of the entry--not good. Maybe keywords at the top of the entry.
I can't remember seeing an index like this on the web. A table of contents is common, and some people even label those as index, but they are not indexes.
So if I had these keywords in place, how do I proceed? What text for the index links? In a book, it's page number. I could just use the entry date, but not very informative. Maybe entrydate and entrytitle. Example:
Berlee
11/12/2003 Blue Bossa
12/04/2004 Tribute to Bill

That's not bad. So how would I do this in Perl? Read an entry and then --
Hmmm. Maybe I want certain words to be hyperlinks into the index. The phrase "chord etude" might link to that index entry. That entry might contain a see also section, which contain a link to, say, William G. Leavitt - chord etudes.
It could be fun adding this index information to the journal!
The idea of linking a word to the index solves the problem of one word linking to many places. Ray Kurzweil's site has an elaborate many-to-many linking (The Brain) and I've seen other things like that on the web for links among web sites, but they are too intimidating for my purposes in this player's journal.
So, Perl for creating the index. Read a section - delimited by entrydate tag at start and hr tag at end. Build a date string from the date entry: 11/22/2004 becomes 20042211. Build a title string from the entrytitle: Tribute to Bill becomes tribute_to_bill. Create an id: 20042211_tribute_to_bill. Insert an anchor in the journal page with that id. (at the entrydate text?) Write an index entry that is a link to that anchor. That way I can have many index entries linked to any given entry.
Try this on a test file. Create a file journal html with a few dummy entries. Write Perl to insert the anchors and to write the index file. This means reading the input file, adding the anchors to it, as it is output, and also writng to the index file for each entry as it is processed.
In the actual journal file, I might want a CSS class for the journal title, then I could use that as a dilimiter to indicate to Perl where to start processing.



0 Comments:

Post a Comment

<< Home