NetTalk Central

Author Topic: best approach to extracting data from a page  (Read 6881 times)

stephen j ryan

  • Newbie
  • *
  • Posts: 10
    • View Profile
    • Email
best approach to extracting data from a page
« on: September 27, 2013, 11:41:02 AM »
hi everyone

i am using nettalk to download web pages

on those pages are table and column data

is there a way to access these columns and rows using string theory? or any other approach

we have a big bulky parser but its not ideal for this type of job.

many thanks
steve

Rene Simons

  • Hero Member
  • *****
  • Posts: 650
    • View Profile
Re: best approach to extracting data from a page
« Reply #1 on: October 01, 2013, 09:13:08 AM »
Hi Stephen,

StringTheory is probably the best option here.

Cheers,
Rene
Rene Simons
NT14.14

Bruce

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 11250
    • View Profile
Re: best approach to extracting data from a page
« Reply #2 on: October 02, 2013, 07:13:03 AM »
yeah, I've done thins with StringTheory on some pages, splitting first on <tr> and then on <td> and so on.
then removing all tags using the Replace method.

cheers
Bruce

rjolda

  • Sr. Member
  • ****
  • Posts: 329
    • View Profile
    • Email
Re: best approach to extracting data from a page
« Reply #3 on: October 13, 2013, 07:02:38 AM »
Hi Steve,
We use XFiles to manage table data from some web pages with a known structure.
Works like a charm!  It is then easy to manipulate the Xfile data!
FWIW,
Ron Jolda