README for RDFGrabber RDFGrabber is a product to search content from other web sites provided they make it available in "RDF":http://www.w3.org/RDF/ format. The benefit of doing it this way is that the data you get is not encumbered with HTML, giving you more flexibility when applying your own look and feel. How to use it First you install RDFGrabber.tgz in the Products folder and restart Zope. You will now be able to create objects of the type "RDF Grabber". The form will ask you four questions: the id, title, URLs of the RDF File/Files and an optional proxy. You enter the URL that will return a correct RDF format. If your RDF file is password protected you can specify authentication parameters like this: http://user:password@host.domain.com/file.rdf I have created some test files on Zope.org: Prefix all names with http://www.zope.org/Members/EIONET/RDFGrabber/ rdfexample1.rdf -- Simple example rdfexample2.rdf -- The second example. rdfexample2.rdf -- The third example. When you have created the object, you must update or synchronize the object with the content on the remote webserver. Click Update to perform it. Most common mistake is bad encoding of the file in which case you get a syntax error. There is also an optional property for a proxy-server. You enter the URL of the proxy as in http://proxy.mycompany.com:8080. Let's say you have created a RDF-file called articledb. Then insert this in your dtml-document to query the RDF:
<dtml-with articledb>
<dtml-in "query(predicate='http://purl.org/metadata/dublin_core#Title')">
	<dtml-with sequence-item>
		<dtml-var subject>
		<dtml-var predicate>
		<dtml-var object>
	</dtml-with>
</dtml-in>
</dtml-with>
If you want your RDF object to import data on a regular basis, you can write a program which updates the channel by doing a GET on the update method as in lynx -source http://www.mysite.com/slashdot/update >/dev/null How it works An RDF-file consists of an triple (Subject, Predicate, Object). They are implemented as Python tuples. A Python dictionary is also known as an associative array. It is kind of like a sack, where you can put all your goodies tagged with a keyword you can use to get them back. RDFGrabber parses the RDF-file, and for each tag inside the four main parts, it stores them under a keyword. Since there is only a few mandatory tags, you must typically first check if the dictionary contains the item before you can use it. RDFGrabber supports the core RDF, and modules, for example: syndication and Dublin Core. How it supports them is very simple. It simply maps the namespaces to easily usable keywords. The Dublin Core has one tag for dates, but RDFGrabber doesnt try to understand the date. It just treats it as a string. Querying The RDF-file can be queried on its Predicates. The query-tab shows an example of this as a combo-box presenting the parsed predicates. The user has also the possibility to fill in an arbitary expression-value. the API Persistence The RDF-source is not stored in the ZODB because the object is likely to be updated often, which means that the ZODB would grow a lot. Therefore it is instead represented as volatile and dumped to the filesystem using the Python "pickle" command. Restrictions & peculiarities Encoding -- The encoding from the xml processing instruction is saved and added to the rdf dictionary. HTML -- HTML (or XHTML) is not allowed inside an RDF file. This may come as surprise to some, but this would circumvent what RDF is trying to achieve. Entities -- All known and unknown entities are supported. Acknowledgements The parser is inspired from the redfoot project.