As a head start, our Commerce Integrations library includes a base class for web scraping connectors. If you decide to use web scraping, you will need to write your own connector that extends our ScrapingConnector
class.
A ScrapingConnector
is an ordinary JavaScript class with methods that take plain JavaScript objects as arguments and return plain JavaScript objects, parsed from a web page. (Returning JavaScript objects instead of DOM elements makes it possible to use the connector in different contexts-- outside of the browser, especially.) A ScrapingConnector
should never return DOM Nodes or take, for example, browser-native FormData
as arguments to a method.
The ScrapingConnector
class does the following:
Requires injection of the window object so that you can write a web scraping connector that can be used both inside the browser, and on the server side. To use it server side, you can swap the native browser window instance with a JSDom implementation of the browser.
Includes the superagent library, which allows you to make HTTP requests to your backend both in the browser or on the server.
Provides a utility function called
buildDocument
which safely constructs a DOM document that you can use to extract data from the page. It uses common, familiar, browser-native selector functions, such asquerySelector
.
Extracting data from a page
import {ScrapingConnector} from '@mobify/commerce-integrations/dist/connectors/scraping-connector' class MyConnector extends ScrapingConnector { /** * An example showing how to make an HTTP GET request to get * and parse data for a Product Detail Page. * * @param id {Number} */ getProduct(id) { const url = `${this.basePath}/products/${id}` return this.agent .get(url) .then((res) => this.buildDocument(res)) .then((htmlDoc) => { return { name: htmlDoc.querySelector('.page-title').textContent, description: htmlDoc.querySelector('.product.description').textContent, //... } }) } }
In the example above, we write an implementation that uses this.agent
and this.buildDocument
to fetch an HTML response, build a document and then parse data for the product detail page, including the product name, and product description. Note that this.agent
is referring to the superagent library.
Submitting data to a server
import {ScrapingConnector} from '@mobify/commerce-integrations/dist/connectors/scraping-connector' class MyConnector extends ScrapingConnector { /** * An example showing how to make an HTTP POST request to add an * item to a user's cart. * * @param cart {Object} * @param cartItem {Object} */ addCartItem(cart, cartItem) { return this.agent .post(`${this.basePath}/carts/${cart.id}`) .type('form') // The data will be form-encoded .send(cartItem) } }
In the example above, our implementation uses this.agent
to make a HTTP POST request, adding an item from a user’s form-encoded data to a user’s cart.
Client-side versus server-side setup
In your app, you need to inject the Window object into your connector to ensure that it can work on both the client and the server. On the server, you'll want to use JSDOM. For client-side use, you can use a browser-native window object. (If server-side rendering is new to you, take a few minutes to learn more about server-side rendering.)
For example:
// Server-side example import jsdom from 'jsdom' jsdom.JSDOM.fromURL('http://www.example.com') .then((dom) => new MyConnector({window: dom.window}))
On the server-side, we use JSDOM to inject the window object into the connector’s constructor.
// Client-side example const connector = new MyConnector({window: window})
On the client-side, we use a browser-native window object to inject into the connector’s constructor.