Thank you Al and Abhishake!
The JTidy library is what I was looking for. I wanted to convert an
external HTML page to a org.w3c.dom.Document object and JTidy does
exactly that.
Now I want to put some kind of wrapper around the Document object so I
can work with it in minilang.
-Adrian
Abhishake Agarwal wrote:
> Hello Adrian,
>
> I don't know whether ofbiz has this, but I have done similar thing using a
> API called html parser. you can search it on google.
>
> Regards,
> Abhishake
>
> On Tue, Jul 8, 2008 at 10:50 PM, Al Byers <
[hidden email]>
> wrote:
>
>> Adrian,
>>
>> In the past I have used JTidy to make sure it is in XHTML and then wrote
>> Freemarker scripts to process the markup. I find FM to be easier to use
>> than
>> XSLT because it has a loop index var and it is easier to connect it with
>> Java classes that you may wish to write to help in the processing.
>>
>> -Al
>>
>> On Tue, Jul 8, 2008 at 10:53 AM, Adrian Crum <
[hidden email]> wrote:
>>
>>> I need OFBiz to gather data from external websites - so that data can be
>>> extracted from the HTML. Is there anything like that in OFBiz? Has anyone
>>> else done something similar?
>>>
>>> -Adrian
>>>
>