As a head start, our Commerce Integrations library includes a base class for web scraping connectors. If you decide to use web scraping, you will need to write your own connector that extends our ScrapingConnector class.


A ScrapingConnector is an ordinary JavaScript class with methods that take plain JavaScript objects as arguments and return plain JavaScript objects, parsed from a web page. (Returning JavaScript objects instead of DOM elements makes it possible to use the connector in different contexts-- outside of the browser, especially.) A ScrapingConnector should never return DOM Nodes or take, for example, browser-native FormData as arguments to a method.


The ScrapingConnector class does the following:


  1. Requires injection of the window object so that you can write a web scraping connector that can be used both inside the browser, and on the server side. To use it server side, you can swap the native browser window instance with a JSDom implementation of the browser.

  2. Includes the superagent library, which allows you to make HTTP requests to your backend both in the browser or on the server.

  3. Provides a utility function called buildDocument which safely constructs a DOM document that you can use to extract data from the page. It uses common, familiar, browser-native selector functions, such as querySelector.


Extracting data from a page


import {ScrapingConnector} from '@mobify/commerce-integrations/dist/connectors/scraping-connector'

class MyConnector extends ScrapingConnector {

    /**
     * An example showing how to make an HTTP GET request to get
     * and parse data for a Product Detail Page.
     *
     * @param id {Number}
     */
    getProduct(id) {
        const url = `${this.basePath}/products/${id}`
        return this.agent
            .get(url)
            .then((res) => this.buildDocument(res))
            .then((htmlDoc) => {
                return {
                    name: htmlDoc.querySelector('.page-title').textContent,
                    description: htmlDoc.querySelector('.product.description').textContent,
                    //...
                }
            })
    }
}


In the example above, we write an implementation that uses this.agent and this.buildDocument to fetch an HTML response, build a document and then parse data for the product detail page, including the product name, and product description. Note that this.agent is referring to the superagent library.


Submitting data to a server


import {ScrapingConnector} from '@mobify/commerce-integrations/dist/connectors/scraping-connector'

class MyConnector extends ScrapingConnector {

    /**
     * An example showing how to make an HTTP POST request to add an
     * item to a user's cart.
     *
     * @param cart {Object}
     * @param cartItem {Object}
     */
    addCartItem(cart, cartItem) {
        return this.agent
            .post(`${this.basePath}/carts/${cart.id}`)
            .type('form') // The data will be form-encoded
            .send(cartItem)
    }
}


In the example above, our implementation uses this.agent to make a HTTP POST request, adding an item from a user’s form-encoded data to a user’s cart.


Client-side versus server-side setup


In your app, you need to inject the Window object into your connector to ensure that it can work on both the client and the server. On the server, you'll want to use JSDOM. For client-side use, you can use a browser-native window object. (If server-side rendering is new to you, take a few minutes to learn more about server-side rendering.)


For example:


// Server-side example
import jsdom from 'jsdom'

jsdom.JSDOM.fromURL('http://www.example.com')
    .then((dom) => new MyConnector({window: dom.window}))

On the server-side, we use JSDOM to inject the window object into the connector’s constructor.


// Client-side example
const connector = new MyConnector({window: window})

On the client-side, we use a browser-native window object to inject into the connector’s constructor.