Post by Maarten ZeinstraThanks for the insights, I didn't realise https doesn't sent referrers. Seems logical though.
Browsers aren't supposed to send a referrer where the link is on an
insecure page and the target is a secure page.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec15.html#sec15.1.3
But sending a referrer is always at the option of the client, and in
my experience, referrer isn't sent going from insecure->secure either.
I don't guarantee this will always work. :)
Post by Maarten ZeinstraI linked to the Https versions of the licenses now. It was interesting that a user only now saw this after years of the links being like that. Probably they don't care like Puneet says.
It seems like you are proposing a good solution, however I would first like to see how many times we enrich the deed pages per month to see if it is being used at all. I hardly see an enriched page. Mainly because recently I tighten my browsers privacy with Ghostery, HTTPSEverywhere and AdBlockPlus. If many users do this than this whole metadatascraper idea is dead.
HTTPSEverywhere generally defeats the scraper, as most license links
are to http://creativecommons... and HTTPSEverywhere either causes the
referrer to be dropped and/or confuses the scraper. Should be possible
to mitigate this by always using https for license deeds, including
providing https urls for links. CC should probably do this.
I don't know that AdBlockPlus does anything with referrer; Ghostery
may, I haven't used it in a long time in favor of
https://disconnect.me/ which I admit I haven't looked at whether it
does anything with referrer.
Another problem is that the scraper will probably miss anything using
modern RDFa (1.1 Lite), which is also a bit less fragile due to at
least not requiring a namespace declaration for common CC usecases. If
the scraper is useful at all it really ought be updated to support
this. Same for the HTML provided with the chooser and documentation.
And I think it makes sense to be neutral about formats and also look
for microdata and microformats annotations.
Those two things (https deed urls, rdfa 1.1 lite & co
support/publishing/documentation) I expect would make the probably
tiny fraction of deeds enriched go up a bit, but more important and
complementary is getting more large sites/widely used software to
publish and consume the annotations. For example Flickr did (may
still) add some RDFa to photo pages, but it was always somewhat
broken. On the consumption side, which is more important IMO, the deed
scraper is it; the intention (again from my perspective) was to close
to the loop, introducing, a, any consumer, so that the annotations had
*some* visibility, hopefully spurring more (but that spurring requires
a lot more finishing, documentation, evangelism that we never got to
for the most part). I haven't followed it closely at all, but maybe
some of Jonas Oberg's work will push in that direction, whether it is
ever reflected in the CC deeds or not.
There was at least discussion several years ago of logging
scraper-scraped metadata so that we could analyze its usage. I don't
remember whether that was set up, but certainly the analysis was never
done. That'd be another thing that could be done, if CC wanted to.
Some info can also be gleaned by crawling the web, or analyzing
others' crawls. I took a look at some low-hanging fruit in that regard
awhile back, and it didn't look great ...
http://gondwanaland.com/mlog/2012/01/23/attribution-crawl/
Post by Maarten ZeinstraI don't know if I totally agree with your statement that creative commons.org or .nl is a bad attributionURL. If they are reusing the work, than the work is itself visible in its reuse and original context might not matter. Do you think an AttributionUrl should be the same as a source url?
Yes. Consider how much less useful the web would be if you could only
link to a site, not a page within a site. Practice of linking to the
homepage of a site that a resource is on rather than the resource
itself is crippling as an attribution url in exactly that way.
If that's too handwavy, consider that you remixed one of my images,
and link to my homepage as attribution. The intent of the license (I
would have used CC0, but generic "I"...) is that the third party can
take advantage of the license offered by me in the original work. If
they have to dig around on my site to find the work instead of
directly going to it, this advantage is substantially diminished.
Mike