Installing the FREEBIE Package

Your Web Site, Search Engines, and a Bookshop

This document continues on from StartHere.html.

7. Now it's time to generate some sample results. You're going to run the script dofind.php, which will do a search on the default sample term "Spokane" supplied in customize.inc (useful because it supplies perhaps 30 titles in a search that runs in seconds). When you click on the link that you'll be given a little below, the script will run and produce an output screen. It will show a column of entries of the form INDEX nn; each of those is a 10-book page fetched from Amazon as XML (which you don't really need to know about). Since the script dofind.php also automatically runs the script holders.php, you will see listed first the 28 confirming file-make statements from it.

In sum, the browser screen you get should look something like this:


     · Alpha-Title file bookpages/book-titles-@.shtml
     · Alpha-Title file bookpages/book-titles-A.shtml
     · Alpha-Title file bookpages/book-titles-B.shtml
     · Alpha-Title file bookpages/book-titles-C.shtml
     · Alpha-Title file bookpages/book-titles-D.shtml
     · Alpha-Title file bookpages/book-titles-E.shtml
     · Alpha-Title file bookpages/book-titles-F.shtml
     · Alpha-Title file bookpages/book-titles-G.shtml
     · Alpha-Title file bookpages/book-titles-H.shtml
     · Alpha-Title file bookpages/book-titles-I.shtml
     · Alpha-Title file bookpages/book-titles-J.shtml
     · Alpha-Title file bookpages/book-titles-K.shtml
     · Alpha-Title file bookpages/book-titles-L.shtml
     · Alpha-Title file bookpages/book-titles-M.shtml
     · Alpha-Title file bookpages/book-titles-N.shtml
     · Alpha-Title file bookpages/book-titles-O.shtml
     · Alpha-Title file bookpages/book-titles-P.shtml
     · Alpha-Title file bookpages/book-titles-Q.shtml
     · Alpha-Title file bookpages/book-titles-R.shtml
     · Alpha-Title file bookpages/book-titles-S.shtml
     · Alpha-Title file bookpages/book-titles-T.shtml
     · Alpha-Title file bookpages/book-titles-U.shtml
     · Alpha-Title file bookpages/book-titles-V.shtml
     · Alpha-Title file bookpages/book-titles-W.shtml
     · Alpha-Title file bookpages/book-titles-X.shtml
     · Alpha-Title file bookpages/book-titles-Y.shtml
     · Alpha-Title file bookpages/book-titles-Z.shtml
     · Alpha-Title file bookpages/book-titles-~.shtml

 BEGUN 18

 INDEX 0
 INDEX 1
 INDEX 2
 INDEX 3
 INDEX 4
 INDEX 5
 INDEX 6
 INDEX 7
 INDEX 8
 INDEX 9
 INDEX 10
 INDEX 11
 INDEX 12
 INDEX 13
 INDEX 14
 INDEX 15
 INDEX 16
 INDEX 17
 FINISHED

 x "Spokane" job begun at Wednesday, 19 May 2004, 16:51:39;
x "Spokane" job done at Wednesday, 19 May 2004, 16:57:53, having taken 0 minutes, 20 seconds.

(The BEGUN number is the number of data pages the script will have to fetch from Amazon; the INDEX entries are those pages being fetched and processed--10 titles a page, except possibly the last; and the rest is self-explanatory.)

Note! some browsers do not display a "page" till they have loaded it in full, which--in this case--means till the php process is complete; so, even if your browser screen does not change, as long as your browser's "loading page" indicator (typically a little image in the upper right of the browser window) shows activity, the search is in process, so Don't Panic.


If your browser has "tabbed" screens, you can and should open "action" links given in this document file (PHP script starts, that is) as new tabs; if it doesn't, you're best to start them in a new browser window. In most browsers, you do those things by right-clicking on the link--which typically brings up a popup menu from which you can select how you will deal with the link.) That way, you can start a script yet keep reading here while the script runs.

For this, and any other link on this document page, when you're through with whatever you need to see or do at that link, and did not use a separate browser tab or window for it, just use your browser's Back button to return to this docfile.


OK, click away and here we go (this should take about 20 seconds).


8. Now let's look at a sample results page.

Note! You will not, at this stage, get anything by clicking on individual book titles!
You still need to modify or install your .htaccess file before those links will work.

Let's look at the page for letter-M titles, which is book-titles-M.shtml. (There will be 28 similarly named files. 26 are obvious alpha files; of the other two, book-titles-@ is titles that begin with non-alphabetic characters--typically numerals)--and book-titles-~  is titles that begin with your keyword search phrase, which is broken out as a separate page of listings so they don't clog up the alpha page they would otherwise fall under.)

For now, you are really just getting a feel for what you have. If the general look-and-feel of the page seems more or less OK, fine, but if not, you can completely modify every aspect of these title-listing pages except the individual-title blocks, and even for those you can customize the background color, the text color, and (within limits) the text size--so you really can get almost any look you want here.

(The drop-in-block values are set in your customize.inc file, as is explained in that file; the rest of the page you change by modifying holder.shtml, the template from which they all derive.) I strenuously recommend that you not rush into any such fiddling till you have finished the install and have thoroughly read all the package docfiles.

I have made all files that will actually be in your site .shtml files, so you can do server-side includes--and, in any event, some of them need to be for this package. (I have also tried to make them all, including the php-generated ones, scrupulously correct XHTML v. 1.0 Transitional, but you should use the W3C Validator to check them when you're finished customizing.) Later on, at your leisure, read the file CustomizingPlus for advice on how to customize these 28 alpha title-listing pages through customization of holder.shtml.


There will also be a 29th file, all-books.txt, that is a straight-text list of all the found books (it is provided so your visitors can search through it, with their browsers' Search feature or by downloading and using a text editor, if they are looking for books with titles containing a given term (but not starting with that term, or for books by a particular author, or from a particular publisher, or whatever); it's just a little extra convenience.



9. Now it's time to determine the real Amazon keyword search word or phrase you will use to build your real results. You will do this by an iterative trial-and-error process in which you seek a plausibly site-relevant word or phrase that will give you an appropriate number of titles to list.


What is "appropriate"? There is no exact answer, but we can get at least a rough idea from simple calculation. You will get two new "pages" on your site for each title, so you might think the more the merrier, but there is a practical constraint: search-engine robots are widely believed to pick up no more than the first 101,000 bytes or so of any page.

You will have 28 pages listing your titles, and--from experience--a typical title listing is, very roughly, 560 bytes' worth. If we start with the incorrect but simplifying assumption that all your titles are distributed evenly across the alphabet, we can calculate a ceiling to effectiveness. The upper part of each listing page is itself about 4,500 bytes, so we can get roughly 96,500 bytes of title data (where each title datum leads to two new pseudo-pages for your site) on each of the 28 title-listing pages before we get to the point (101k) where the searchbots will ignore the rest of the page. That is roughly 175 titles a page; in other words, about 4,900 titles is the maximum you could effectively list (that is, that will be seen by today's searchbots), which is about 9,800 extra site pages for you.

Also: Amazon has foolishly reverted to its bizarre practice of only returning a maximum of 32,000 "raw" titles--which is typically about 9,600 listable titles--on any one search, so there is again a definite upper limit on how many you even can list.

(There's little point in trying to edit down the pagetop area: 4,500 bytes is only about 8 titles, and you have to have some text to open your page, so in practice you could add perhaps at most 50 pages more to your site--out of almost 10,000--and only by severely restricting the useful information visitors should get on each title page.)

But titles are not distributed evenly across the alphabet: the ratio of titles starting with S to titles starting with X can be from 50:1 to literally 1000:1 in a large sample. That means that to approach the theoretical maximum, you need to be listing materially more actual titles--because even though the more "populous" letter pages (like S) will be soon "saturate" (be over 101k in length), the less-populated ones will continue to gain you pages as they grow in length toward saturation (but, to saturate the least-populated letter, typically Q, X, or Z, you'd have to be listing somewhere from a quarter million titles on up to almost five million, so self-evidently you cannot expect or hope to saturate all letters).

There are no hard and fast rules for an optimum number of titles, but clearly it is comfortably, but not vastly, over 4,900. My own experience with one site shows about 8,700 titles listed, of which about 4,700 would be seen by searchbots. In another case, a site I am still tuning for optimum, there are over 20,000 titles being listed, but 7 of the 28 pages are unsaturated with perhaps 4,200 titles being seen. On the other hand, for the first site, the largest single page is only about 350k long, while for the second the largest page is a ridiculously unwieldy megabyte-plus length. The lesson is that if you target about 8,000 to 10,000 titles, you are probably in the optimum range, though only an examination of the actual distribution of file sizes across the 28 search dropin files will tell for sure.


Your Amazon-search "keyword phrase" can be a single word, or several words; if it is more than one word, just separate the words with blank spaces. Multiple words in a phrase are "and'ed": titles are returned only if they are relevant to every word in the phrase--so using more than one word always acts to narrow down the results, with each extra word further narrowing the field.

Sometimes the selection process is natural and easy; a site whose theme is "vegetable gardening" would find that simply using Vegetables would return a satisfactory number of results. But sometimes the selection requires some thought if your first couple of stabs yield only low--or ludicrously high--totals. There is, though, always a satisfactory phrase if you just think about it long and hard enough.

You will effect this trial-and-error effort by using a one-time-use package tool called bookcount.php. This tool will report, in a second or three, the following--based on what keyword search phrase you are trying (at its first run, it will use the default word listed in your current customize.inc file)--in a form like this:

Search-Phrase Results:

For the search keyword phrase Spokane, Amazon would return exactly 177 "raw" titles--but, since most would be unavailable, you would, by rough rule of thumb, probably actually end up listing somewhere from 44 to 62 books in all, with around 53  being the most likely figure.

Word/Phrase to try next:

(type in word/phrase then press Enter)

177 (for this example) is the actual number of titles Amazon would return; it is a definite number. But this package--as a service to your visitors--will strip out all books that are not actually available to be bought at the moment (typically they are out of print, but Amazon does not delete them from its catalogue). For most fair-sized lists of titles, typically about 70% that Amazon has in its catalogue will not really be buyable; the script gives you a rough estimate that is from 25% (here, the 44) to 35% (here, the 62) of the "raw" total Amazon reports, plus a 30% centrum (here, the 53 figure).

(Rather obviously, "Spokane" would be a terrible choice for a search keyword; indeed, it is the default provided just because it produces so few titles, making it a fast first-trial run.)

You can now try different words and phrases using bookcount.php; you can begin by typing a first trial into the box shown above, then pressing your Enter key: that box will actually work, but you will then leave this document and be looking at actual output from bookcount.php. Keep trying various words and phrases till you have one whose titles count you are satisfied with. Carefully note it down somewhere, then return to this document by using your browser's Back button (which might take multiple uses of that button).

(If you still, after reasonable effort, having trouble getting results of a size you are comfortable with, check the package docfile ComplexSearches.html to see how to construct--you guessed it--more-complex searches.)

Now edit your settled-on word or phrase into your local copy of customize.inc (at Step #2 therein)--and pay attention to the notes.

While you're at it, this is the time to also edit some of the other data in customize.inc, especially those at Steps #3, #4, and #5 (you may want to wait to edit the data at Steps #6, #7, and #8 till you have further customized your holder.shtml template file). When you're done, upload the edited customize.inc as described below.

How to upload edited files:

Do not just upload the file!

Make a copy of the file with an "at sign" @ in front of the name:

   example: @customize.inc

Upload that "at-sign'ed" copy to your /golf-books directory.

Run the script finstall.php (which can safely be run any time).

After finstall.php has made an "un-at-signed" duplicate, delete the
"at-signed" copies from both the server and your local directory.

(Follow that procedure for uploading any package file modified since the original install upload, unless it's a file going into your root directory.)


10. Now you are ready to do the search that will actually establish your real bookshop. Before you do, I remind you that as your search runs your browser may not show anything save "page loading" till the whole search is complete. (Later on you will be shown how to start the process as a daemon, so you don't have to sit waiting for it.) To guesstimate the run time, reckon on about ten raw titles a second; thus, a search that will produce about 7,200 raw titles (which you might expect to yield roughly 2,160 final titles) will take about 720 seconds, or 12 minutes. But one that seeks out 32,000 raw titles can take a solid hour or so.

When your run is done, you will again see a column of INDEX xxx text and at the bottom the word FINISHED; till you see FINISHED, the script is still running.

(Reminder: it is wise to start the run in either a new browser tab, or--if you don't have tabs--a new browser window.) Start the process, then return here so we can do some things while it's running.

Here's the link to start the search process.


11. While you are waiting for the search process to finish, use the time to customize the two root-directory files you need to work with.

11a. First, your robots.txt file: this is a file that must (or it won't work) reside in your site's root directory (the one where your front page is). The finstall.php script will have reported to you whether you have one, and, if so, what it contains (but that's only a help--look manually to be sure!).

(Here, "root" means true root--that is, if you are working with a subdirectory/subdomain type "site", the file robots.txt needs to be in the true, highest root directory.)

If you have one, just add the lines from the included sample into it. (Download it, edit it, and upload it back.) If you are absolutely, positively sure that you don't have such a file now, just rename the one included with this package to robots.txt (that is, drop the .SAMPLE extension) and upload it to your site's root directory.

All the added (or new) text does is tell search-engine robots to not attempt to follow any links from the files affs.php or abe.php--thus, any outbound link sent through those forwarders will be invisible to search engines. If you don't do this, you will needlessly bleed some overall site Page Rank (to Abebooks for one thing). (By the way, each of those thousands of lovely new pages will have a backlink pointed at your site's front page.)

11b. Now for your .htaccess file. Because this file is very important to you and your web site, and what you do to it is (though, for this package, very simple) critical, I have provided a separate docfile just about the .htaccess file. If you are an old pro with such files, excellent--you probably know a deal more than I do. But if you're not an htaccess guru, you may learn some interesting things. At any rate, what exactly you need to do, and just how to do it, are in the package file YourFile: please read that now, then return here when you're done. (As you leave here for now, recall that your browser is, presumably, still spinning on your book search.)


12. When your browser search tab or window shows--as described above in Step 10, where you started your book search--that the search run has finished, it's time to see what your new bookshop looks like so far (you still have significant customizing to do).

Eventually--after you have properly customized it--the file mybookshop.shtml, placed in your site's root directory, will be the "front door" to your new bookshop, and an integral part of your site. Right now, you haven't customized it, and there is no copy in your root anyway. So, for the moment, you will look at your shop through a special setup-use-only file that the installer made for you, shoptest.shtml, which is basically mybookshop.shtml with the links reset so that they'll work for a file at this level in your site directory.

You will be looking at your new bookshop as your site visitors (and search engines!) would see it (and you will see why the page needs customizing!). Explore your 28 end-product "real" or "static" pages; try clicking on some titles, and make sure that everything there looks OK too.

Reminder: you won't be able to see any individual-book pages unless
you have already placed those magic "Rewrite" lines in your .htaccess file!

(And recall, as you try various individual titles, that not all books will have reviews at Amazon, nor will all books have cover images available for them, so keep poking about till you have seen a fair number of samples.)

OK, So now click and look around (preferably in a separate tab or window).


13. If you have reached this point still in a happy frame of mind, your remaining chores are few. You need to customize two files--mybookshop.shtml, your new shop's "Front Door", and holder.shtml, the template for the 28 title-list pages--to properly reflect, you, your site's theme, and your site's "look and feel"; there are extensive directions explaining what you need to do and what you can do in the separate package docfile CustomizingPlus.html.

You also may need to customize one line in the file amazon-message.shtml, the one down near the very bottom that reads:

<b><a href="http://www.1st-beginners-golf-swing-tips.com/mybookshop.shtml">the bookstore page</a></b>.

(but with whatever your site's domain is); if you have chosen to give that file a different name (such as widget-books.shtml), be sure to modify that line accordingly!

You should also give thought to customizing the two ancillary search files, book-search.shtml and used-books.shtml for "look-and-feel" compatibility with the rest of your site.


14. Finally, when all is ready, you need to find a way to be sure that the dofind.php search gets done anew daily.

(One thing you should know about dofind.php is that if you call it with a particular parameter, you can then abandon it and it will merrily run on anyway as a "daemon process". To be exact, if you call it as--

http://www.1st-beginners-golf-swing-tips.com/golf-books/dofind.php?silent=yes

--and then, as soon as your browser starts to "load" it, as shown by the browser's "loading file" indicator, you can count slowly to 5 for good luck, then just quit the file. The process will finish up anyway.)

So you could, at worst, manually start it every day, or night, in that manner. But that's needlessly tedious.

Your host should have available to you a scheduler that allows you to set jobs to be auto-executed at some time and frequency you choose; typically--and especially on an Apache-based server--that scheduler will be cron; but what it is doesn't much matter. The way you use it is to set the job to be executed as the special script, provided for this purpose, cronf.php, which of course is in your /golf-books/ directory.

Do NOT try to start dofind.php direct--that will not work from cron.

Be sure to use only cronf.php as the cron target script for cron to run!

How exactly you set cron to run a php script can vary from server to server, and will depend in part on whether your host runs php as an Apache module or as a cgi-wrapper process. You need to confer with your host if you don't already know how to start php scripts from cron on that host but, as general background information:

The cronf.php file has already had its file-access permissions set to 777; cron typically requires a 7xx setting, so you're ready.

If your php is cgi-wrapped (as many are), your cron command needs to call php and pass it the location of cronf.php as a parameter; that often will look something like:

/usr/home/yourid/public_html/cgi-bin/php4.cgi -q
  /usr/home/yourid/public_html/siteroot/golf-books/cronf.php

(all one one line, of course)

If your host's php is just the Apache module, you should be able to simply pass the script's location as the cron command, something like:

/usr/home/yourid/public_html/siteroot/golf-books/cronf.php

But don't guess!

Confer with your host for exact instructions!



Some Important Further Package Notes:

  • The findbooks.php script (called by dofind.php) has a failsafe for those rare days on which Amazon has a bellyache and either isn't responding at all or is returning goofy results: after it completes its search, and knows how many titles it has found, it looks at the existing title count (the one from the last time it was run) and checks that the current number is at least 80% of the last; if not, it aborts, and leaves yesterday's pages in place. While the total of titles one will get from an Amazon search will vary a little from day to day (or, probably, hour to hour), it should be fairly stable; so, if the new total is less than 80% of the old, the script assumes something went seriously wrong and does not post the new results (which would materially lower your site's page count for that day).

    (This can, if you should for some reason--say you decided that 15,000 titles is too many--switch from one search phrase to another that produces materially fewer titles, cause a snag: the script will assume that it is Amazon having a problem, and refuse to write out the new results, even though it gets them and keeps them in its workfile. This is a rather unlikely scenario, but, if you run up against it, no problem: after doing a search on the new term (you probably did, or you wouldn't know you have this situation), just run the special-purpose package script pull.php; that script duplicates the work of findbooks.php except that it omits the search proper, so it will simply rebuild the title dropins from the existing workfile that your revised search already made--and you only need to use it once, because after that the situation will no longer exist.)

  • You will now find, in your /golf-books subdirectory /logs, a file cleverly named Log.File. Its current entry will look something like this:

      Started search, keyword phrase "Spokane", Sunday, 30 May 2004, 16:35:00
        Finished search, keyword phrase "Spokane", Sunday, 30 May 2004, 16:35:13
          having taken 0 minutes, 13 seconds.
      Started search, keyword phrase "Baseball", Sunday, 30 May 2004, 16:41:35
        Finished search, keyword phrase "Baseball", Sunday, 30 May 2004, 16:42:35
          having taken 7 minutes, 22 seconds.

    From that example, you will deduce that it is an additive log--that is, it will continue to grow till you delete it, when it will start anew. It is mainly useful if problems develop, but it will tell you that your job ran and how long it took. Every so often you should review it, then delete it, lest it grow forever.

  • The script that searches Amazon has a built-in microtimer to assure that it does not issue a page call to Amazon more often than once a second, because that's the maximum rate allowed by Amazon's "Terms of Service" for this data interface. The hit rate without the microtimer would be less than double this, which is not enough to be likely to bring the wrath of Amazon down, but why take risks? (This is why you can pretty reliably estimate run times at ten raw titles a second--ten raw titles is one Amazon XML page.) Be aware, though, that if your host is running Apache under Windows, that microtimer will not work--as so many things do not work on Windows-based servers--and so, though your search runs will be a little faster, you will almost certainly be technically in violation of Amazon's Terms of Service. (Windows Apache just doesn't have the needed function calls or anything like them.)

  • There is in this package a small diagnostic-tool script that is not mentioned anywhere else in the docs because you normally don't need it, but you might like to know it's there--phpinfo.php. If you run it, it just generates a long screen of all the data relevant to PHP on your host. You might find that screen interesting (or you might not).

  • There is also in this package a special-purpose file, putback.php, for use in upgrading from an older package version; information on how to use it is in Upgrading.html, but it simply restores backed-up versions of the six customized (or possibly customized) files after a re-install.

  • And one last note: remember that Google may not--almost surely will not--notice your thousands of new pages the minute they hit the web. It will notice them no faster (or slower) than if you had added that many real, physical, stored-on-drive new pages--which can a long time. You can probably speed up the process by having other pages of your site, and/or your Site Directory, point to not only your "front door" mybookshop.shtml page, but also to the 28 title-listing pages, so that the 'bots can find the links to those list pages more easily and rapidly. Even so, though, it may take a while--sad to say, weeks or even months--till Google gets round to all several thousand. But if you can see them, they're there, and eventually the 'bots will see them too. (Oh, and it wouldn't hurt to get some external backlinks to your new bookstore.)

And don't forget to go read CustomizingPlus.html and edit those few pages wanting customization! Do it now!

If you don't--especially as regards the "front page", mybookshop.shtml, or whatever you've called it--your shop will work, but will look rather strange to visitors! (Unless your site really does sell widgets . . .)

Otherwise, though, you have a working bookshop: just link to it when you've finished the page customizing.


Inquiries

My name is Eric Walker. Email me at webmaster@seo-toys.com. I am glad to work at length with anyone seeking to install and use this package.

--==ooOOoo==--