Logo

Download

Contact

Docs

OXPath is a language designed for scalable web data extraction (scraping), crawling and automation.
OXPath extends XPath with actions (e.g., click, form filling), Kleene star for iteration, and markers for data extraction.

Simple OXPath example
doc('scholar.google.com')
	//input[@type='text']/{"OXPath"}/descendant::button[1]/{click/}/
	(//a[.#='Next']/{click/})*
		//div.gs_ri:<PAPER>
			[.//h3/a:<Title=string(.)>]
			[.//div.gs_rs:<Abstract=string(.)>]
			[.//*.gs_a:<Authors=substring-before(.,’ - ’)>]

OXPath supports XML, RDF and Relation Database output formats, plus custom output handlers can be provided. OXPath runs in a real browser, therefore every site rendered by a modern browser can be interacted with and extracted with perfect accuracy. For this, OXPath from version 2.0 relies on WebDriver (Firefox only). However, when a full browser is not necessary, OXPath can be used with the headless browser emulator HTMLUnit.

For citation and full description of OXPath and its semantics please refer to the article here.

Download OXPath 2.0 and Get started with examples.