A Service of Softnik Technologies

Whois Parse / Extraction Table Configuration

Configurable Data Extraction

Watch My Domains is designed to be highly user configurable. This allows the software to quickly adapt to format changes in whois data or to new TLDs that may be made available from time to time.

This article explains the whois extraction table used in Watch My Domains SED.

Changing Data Extraction Settings

You can access the 'Whois Setup' panel from the side toolbar.

Whois Setup

Type in a TLD to retrieve it's current settings, and then click the 'Extraction' tab. Please see the screen-shot below.

Whois Extraction Token Setup

Parse Table

The parse table consists of a set of entries that look like

token=>column

For example,

Last Update:=>last_update

... will look for a token called Last Update: in the raw WHOIS text row and extract whatever comes after it to the last_update domain data column.

If the data occurs with a unique token, you will need to enter only the unique token. For example, if the registry WHOIS contains a line like

Registrant Organization:Softnik Technologies

... the whois parse table entry will be

Registrant Organization:=>Owner

If the data is in the next line, you can add a {nl} to the token. For example, if the whois data shows

..
..
Registrant:
  Softnik Technologies
..
..

The parse table entry will be

Registrant:{nl}=>Owner

Multiple Occurrences of a Token

If a token occurs multiple times, all of them are collected and extracted with a comma separating them.

You can use {n} (where n is a digit) in the WHOIS parser token to pick a specific index when there are multiple occurrences. For example,

organization:{2}=>organization

... will pick the 2nd occurrence of the organization: entry in WHOIS output.

Multiple Entries with a heading Token

Sometimes there will be a heading token with a number of rows below it. For example...

...
Name Servers:
ns1.softnik.com
ns2.softnik.com
ns3.softnik.com
...
...

In such case use {ml} in the token to indicate that multiple lines following the token should be extracted.

Name Servers:{ml}=>name_servers

Entries that appear in multiple blocks

Some times the same token will appear in multiple blocks. For example, the 'Owner' token can appear under 'Administrative Contact', 'Registrant' and 'Technical Contact'. If you only want the 'owner' under the 'Registrant' use...

Registrant@@Owner:=>registrant

Defining the Parse Table

When you make changes to the parse table it is assumed that you will provide all the new definitions for all columns. This means that the application no longer use the internal parse table. You can use [*]=>* as a first entry to tell the parser to override this behavior. In most cases you want this. However, if you are going to define all the extraction tokens manually don't insert that entry.

Here is an example.

[*]=>*
Registrant Address:=>address
Registrant:@@Name:=>owner
Renewal date{nl}=>registrar_expiry
Expiry date:=>expiry_date
Name servers:{ml}=>name server

Parsing the Whois Data Again

You can use the 'Parse Whois' button on the side toolbar to re-parse the WHOIS after changes are made to the extraction table (without doing a new WHOIS lookup).

Please also see

Extracting Data into Custom Columns