Content-Type: text/html While I'm not disputing Nick's point that the best way to understand what your server is serving is to look at the log files, it's not too hard to add a little jQuery script to a page that can send a notification to Google Analytics when a document link is clicked.  We did this recently with a site running in a CMS.  Links to downloadable documents get decorated with a (.PDF) notation and an icon, and clicking the link will snitch to GA.  This is easier to do on the client side than it would be on the server side.  And because it's a CMS that will be maintained by the client, they don't need any training in how to format the link.  They just put in a plain link to a .pdf, and the script finds it, makes it pretty, and adds click-tracking.

Carl Raymond


On 4/12/2012 9:06 AM, James White wrote:
[log in to unmask]" type="cite">

Hi All,

 

Below I’ve reproduced the meat of a conversation between ISP IT staff and Michael VanPutten regarding tracking of non-HTML files such as .pdf’s, .doc’s, .wmv, etc., and I’m interested in finding out what others are doing that brings all the “statistics” into one user space?

 

From: James White
Hi All,

 

I believe that Google Analytics is on all HTML pages of all of our websites now.  While the http://webstats.isp.msu.edu/ system automatically tracked uses of .pdf, .wmv, and other non-HTML files Google Analytics cannot do it without adding extra code to each link within our sites. Google Analytics cannot track such .pdf, etc., page requests at all from external sites. For details see http://support.google.com/googleanalytics/bin/answer.py?hl=en&answer=55529

 

Do we want to do this across all sites or more selectively or not at all? Do we want to do it by hand coding such links? The biggest problem for hand coding is that when things get moved to the CMS then each and every CMS user becomes responsible for properly creating the links. An alternative is to have our core code do it at the same time it deals with presenting the Adobe link at the bottom of the page and adding a (PDF)* to such links. The code would optionally also need updated for .doc, .wmv, .avi, and other such files. The advantage of doing it here is that (given someone has the time) it can be added to the code now. The biggest problem is that it additionally slows down page serving.  Yet another possible alternative (and it will need some exploration) is adding it via the CMS at the time it does its output. In theory, if XSLT processing can do it (or maybe Velocity?), it can be added at the time the page file is published.

 

Personally, I’d like to see all of the email cloaking, .pdf special handling, etc., be done as part of the CMS publishing output if possible so that it only occurs once at CMS “publication.”

 

The fastest to get to is to brute force have a student that is not a programmer grind through 30-60 hours (or maybe a few more?) of finding the links and updating them but that leaves the issue of future links plus the baggage of having that code in there if we ever want to take the automated route.

 

The second fastest is to have a student that is a programmer add this process to the current .pdf handling process. This would probably wait in the queue for at least 3-4 weeks unless we jump it in front of other projects. Probably would take 15-80 hours depending on the programmer skill level plus my time and attention. Then their effort would later most likely all be replaced by the CMS publishing option somewhere down the line.

 

From: Peter Cole

I think the question of whether or not to track PDF and media usage is one that Stephanie should answer as she is the one (as I understand it) who is uses such information for reports, site redesigns and so on.

 

If she is interested in this information, then I think you are absolutely right. If the CMS can automatically add the necessary code so neither the end user, and more importantly any of us in IT, are burdened with additional work then it makes sense to go that route.

 

I believe that at some point I was told that Dayo would like this CMS up and running in June. If this is the case then I do not think it is essential to spend time adding this tracking code to our sites now when it can soon be automated. While it is true that we would not have that information in Google Analytics for a month and a half (give or take), I am not sure that the loss would be that critical (though this too is a decision for Stephanie). Additionally, we will still be running AWstats so we would still be tracking it, just in a different system.

 

 

From: James White
The CMS is capable (presumably) of XSLT processing of the page content that it publishes. But it needs carefully tested in particular to be sure it will properly handle, i.e., basically skip over, PHP code that happens to be in the web page. After that it requires rather complex XSLT programming which may be possible but goes well beyond what XSLT was really designed to do.

 

So basically my thinking is that in the ideal world we would wait for CMS but in the practical world we’re probably better off, if we want any timely .pdf statistics, updating our core code (sometime in the next month or two) then later replacing that functionality whenever it’s possible to get the CMS to do it.

 

 

From: Stephanie Motschenbacher

Interesting... Currently I am depending on data that I gather from my social media efforts. Usually we promote the posting of new items like pdf, videos and photos. Although not perfect it gives me something to understand what people are interested in. 

 

Is there anything we can do in structure ... Create special landing pages to help track? I think Mike Vanputten might be a resource. I am sure UR is trying to collect similar information.

 

 

From: James White
Hi Michael (and All),

 

Michael, I’m adding you to our internal ISP discussion of plans/techniques for using Google Analytics to track non-HTML web files such as .pdf, .wmv, .avi, etc., to see how you might be dealing with it (maybe I’ll throw that thought out to the WebDev CAFÉ in a few minutes too).

 

For .pdf, .wmv, etc., files it is certainly possible to embed them in a page that can be Google Analytics tracked. For example the African Studies Tuesday Bulletin .pdf we embed within the http://africa.isp.msu.edu/whatsnew/tuesdaybulletin.htm page but then we also include the “Open PDF in new window/tab” link to allow a user to get it as a standalone .pdf file since the user may want that or, by older browsers that don’t support such embedding, be forced to follow the standalone .pdf link. So anyone that wanted to externally link to a specific Tuesday Bulletin issue would simply follow that link then grab the URL (e.g., http://africa.isp.msu.edu/whatsnew/tuesday/PDF/TB2012-02-28.pdf) and anyone following that external link would still not be tracked. Normally all our videos are already embedded within pages such as http://www.isp.msu.edu/multimedia/?video=1-14-31 so as long as you include the query string (?video=1-14-31) at the end you get directly to the correct video. I don’t remember if we still have any pages that simply list videos for people to link to or download though I believe I remember that African Studies, Asian Studies, and Study Abroad had such in the past. I think most of our videos have probably been converted over to the embedded form. So the problem would mainly be .pdf and some .doc and .xls file use tracking. I don’t know of any graceful, cross-and-older-browser ways to embed the latter two (Word and Excel) types of files.

 

Certainly an intermediate landing page for some files would work but it may also generate frustrated users who occasionally then don’t bother to go the extra step for the actual file too.

 

From: Michael VanPutten
James,

 

There are two webpages that may be of assistance with what you are trying to do:

 

"How do I manually track clicks on outbound links?"

http://support.google.com/googleanalytics/bin/answer.py?hl=en&answer=55527

 

Event Tracking Guide

http://code.google.com/apis/analytics/docs/tracking/eventTrackerGuide.html

 

In short, you track the click to the file versus the file itself. You would also likely be able to cross reference your Google Analytics information with server logs for file requests (e.g., how many times was worddoc-xyz.docx downloaded/requested).

 

[end of recorded conversation]

 

Other thoughts welcome. Basically we’re interested in getting the statistics but without 40 people who maintain content having to learn and do special coding for .pdf, etc., files and without tons of programming to meld server logs with Google Analytics so we can actually see direct hits on .pdf, etc., files from external links. Or, if you’re just using Google Analytics and ignoring the hits they don’t record that’s a useful piece of info too.

 

 

James

James White, Web Coordinator
International Studies & Programs

Michigan State University

International Center

427 N. Shaw Lane, Rm 207

East Lansing, MI, 48824

(517) 884-2142

[log in to unmask]">[log in to unmask]

 



-- 
Carl Raymond
Software Developer
University Outreach & Engagement
Michigan State University
Kellogg Center, Garden Level
219 South Harrison Road
East Lansing, MI 48824-1022

[log in to unmask]">[log in to unmask]
(517) 353-8977
http://outreach.msu.edu/