You may not be aware of it, but last year I created web-based versions of … looks like seven of my eBooks. It was a significant amount of work to get them set up, because of the way I wanted to do it – I used a wordpress modification which allows readers to comment on every single paragraph individually, and to divide the text into reasonably small “bites” of content. So for books like Cheating, Death I could break it up by chapters (most are almost exactly 2,500 words – long for a web page, but not totally unreasonable (e.g.: putting a whole novel on one long, scrolling page)), but you can go in and comment on any individual chapter of the book if you wanted. (Say, if there were a typo, or a plot hole, or other problem. Or if there was a particular scene you liked or didn’t like, and wanted to say so.) I like the idea of it, and while I’m not generally a fan of what commenting tends to be on most sites, I’ve seen this sort of setup put to excellent use and I can imagine a lot of good things coming from it.
On the other hand, it’s ridiculously difficult to try to track how many people are reading such a thing. I’ve tried fixing it several times, but Google Analytics doesn’t report it properly. I’ve been downloading my server access logs and manually parsing them (to get eBook download numbers) since February of 2011, when 1 and 1 (my web host) changed their Web Statistics to “Site Analytics” and removed all the usefulness from the tool for me. I tried parsing out the data about access to the 7 domains/subdomains which hold the web-based versions of these novels, to try to get any useful data about how many people have been reading them, and to start I just parsed out February and December’s numbers (rather than going through the full year before figuring out whether I can get anything useful out of them). (Yes, I know, I could maybe write a script/program to parse the logs for me. That might even work for the eBooks, despite at least half of the logs being garbage (it looks to me like zombies accessing hundreds/thousands of nonexistent URLs, possibly as some wasted DDOS effort), but for these sites … I’ll explain.) The logs are a mess.
I’d have to figure out which IPs are robots, first, I think, so I can get rid of all the requests from them – a lot, lot, lot of the requests are clearly spiders following every single link on every single page. Since every single paragraph has a unique URI for its location and a corresponding link to the separate comments associated with it, there are hundreds/thousands of links per book which I know no human would ever have clicked; they’re links to comments which clearly say there are zero comments. From what I can tell, there’s at least one Russian spider/bot following every link of every page of all these domains at least once a month, using a wide range of IP addresses to do so. Plus google, which isn’t as thorough or as frequent – which seems reasonable, since none of these sites have been updated in the slightest in a year.
ASIDE: Oh, yeah, that’s another thing. There hasn’t been a single comment anywhere on any of the books in a year. (Well, come to think of it, those Russian IPs are probably the SPAM bots posting SPAM comments Akismet has no trouble automatically moderating. There are huge numbers of those.) Whether or not anyone is reading these versions of the books, they certainly aren’t commenting on them. Or linking to them (no trackbacks), or emailing me / calling me / texting me about them. (Aside to the aside: While I was in the middle of writing this post, I received a phone call from someone asking whether I buy poetry. The person says they have, maybe, six or seven poems. Apparently, ever. It’s like people can’t read.)
So I can pretty easily see how much traffic a particular domain/subdomain received, based on the logs. A lot of that is bots, not humans. Worse, the bots make it so, if I try to total up access to individual pages of each book, I’ll have to manually filter out all the requests the bots made for things humans didn’t. There’s no easy script for that, because I have to make a human determination about which pages humans might have clicked on and which ones they clearly didn’t (or aren’t worth counting), and there are hundreds to thousands of those little decisions per domain per month of data. Some of it isn’t just bots, but bot-garbage (requests for non-existent pages). I thought I’d take a look at the 1 and 1 Site Analytics to see what it said, and at the way, way lower Google Analytics numbers to compare, but … they’re all so wildly different from one another. For reference, the 1 and 1 official Site Analytics tool reports fewer than 1/4 of the requests for my most popular eBook file (not the web ones, the PDF) versus the raw logs those analytics are theoretically built from, and for other files I’ve already parsed, even the variations are all over the board. Likewise, if the 1 and 1 Site Analytics tool were to be believed, in December 2011 around a thousand different people each read one chapter of the web version of Cheating, Death (pretty evenly distributed across all 13 chapters), and a small handful read every chapter. My access logs show almost 2k page requests (almost double what 1 and 1 shows) for the same period. Google shows … twenty page requests from 11 visitors… though admittedly, they’ve mixed together numbers from four other books in that (all the books in the Lost and Not Found universe are on the lostandnotfound.com domain, and I can’t get Google Analytics to properly separate out the subdomains) so that’s 20 page requests across the several hundred pages of five books… and only really from 9 different pages, only 1 from Cheating, Death… except it isn’t that, either. Google has no idea what to do with these web pages.
So how many people are actually reading these versions? While I don’t want to actually invest the dozens of hours it would take to parse the data, at a glance it looks like very few. Possibly none, depending on the bots. Maybe a dozen people a month. Why am I asking? Because I have to pay the domain renewal fees on those domains every year, really. Is it worth $9/year (and/or the hassle of moving them to modernevil.com, or moving the registrations to another registrar, or whatever) for zero to perhaps a dozen people a month to read these versions of these books, instead of the other sixteen ways they can read them (seven free)? This year I’m cutting out recurring costs for things which my readers don’t take enough advantage of for them to be financially worthwhile (see my posts on canceling distribution, if you haven’t yet), and I’ve got a few months but I’ve got to decide whether or not to keep paying to maintain the dragonstruth.com and lostandnotfound.com domains… and whether, if/when I release the domains, I should bother getting the web-based versions of the books back up and running on one of the domains I’m keeping.
Speaking of which, what do you think about my moving this blog to, say, teelmcclanahan.com/blog/ ? That site probably needs a revamp, anyway, but if I’m paring down domains, maybe lessthanthis.com is one to subtract, too. Considering I never/extremely-rarely get comments, I’ll probably turn off blog comments while I’m at it. I ask these sorts of open-ended questions, questions only readers of the blog can answer, and don’t get answers… maybe I’d do better about not bothering to ask (or feeling compelled to ask) if comments were just … gone.