1. Home
  2. Docs
  3. Data Subscriber API Documentation
  4. Tutorials
  5. Merchant Sight

Merchant Sight

Merchant Sight, our Machine-Learning-based flagship product, is used by several of our customers in a batch processing mode.

Instead of using the real-time API for their UI apps, these customers prefer to send millions of requests and simply collect the data.

In this tutorial, we will show you how to do precisely that!

As data collection and analysis is often a job for a data scientist/analyst, this tutorial uses the Python programming language and the pandas library.

Writing the script will be a very simple coding exercise; we will first show and discuss every piece of the script, and then put all pieces together in one file (provided at the end of this page).

Get the API wrapper

First, to make your coding job easier, clone (or copy) the Python module to interact with our API that’s available in our GitHub page.

Put the module pentadata_api.py in the same folder where you will place your script. With so doing, you will be able to use from pentadata_api import PentaApi in your Python code.

The Sight request

This simple function implements a request to our MerchantSight API:

def merchant_sight(penta: PentaApi,
                   textstring: str,
                   city: str = None,
                   postal_code: str = None) -> dict:
    """Request to Merchant Sight API."""
    payload = {
        'textstring': textstring,
        'city': city,
        'postal_code': postal_code
    }
    url = 'https://api.pentadatainc.com/merchants/sight'
    response = penta.post(url, json=payload)
    if not response.status_code == 200:
        logging.error('Error >> %s', response.text)
    data = response.json()['sight']
    logging.debug('%s >> %s', textstring, data)
    return data

As you can see, it executes a standard HTTP request to our API, wrapping it into the PentaApi object that handles automagically the login/refresh calls for you. Details on this object in 1 minute!

Many Sight requests!

Like said above, data scientists usually prefer to use this API by sending a lot of requests and collecting all data for further analysis.

If you are in this situation, then you’re likely using the DataFrame object for your data. Thus, here’s a function that calls the MerchantSight API for all data in a DataFrame.

Note that the DataFrame is modified in place, and filled with the data returned by the API.

def predict(penta: PentaApi, df: pd.DataFrame) -> list:
    """
    Fill the data frame in place with the top results from the API.
    
    It also builds a json array with all I/O to/from the API,
    that is returned to the caller.
    """
    all_data = []
    for index, row in df.iterrows():
        textstring = row['input_textstring']
        try:
            sight = merchant_sight(penta, textstring)
        except:
            continue
        all_data.append({'input': textstring,
                         'sight': sight})
        # Put 1st only in the dataframe.
        if len(sight) > 0:
            df.loc[index]['output_name'] = sight[0]['name']
            df.loc[index]['output_address'] = sight[0]['address']
        time.sleep(1) # be gentle with our API!
    return all_data

 

Read input, execute calls

With the two functions above all that is left to do is:

  • Instantiate a PentaApi object.
  • Read your data from a file/database and put them in a DataFrame.
  • Save the results.
  • Analyze!

The first three steps can be written in a main block like the following one. For the fourth one, we’d be happy to advice if you get in touch.

Note: The CSV/DataFrame must have a column “input_textstring”.

Oh, don’t forget to get your API key; it’s free!

if __name__ == '__main__':
    # Fill in the info
    # #
    outfile = './output.json'
    outcsv = './output.csv'
    email = 'YOUR-EMAIL'
    api_key = 'YOUR-API-KEY'
    infile = './input_data.csv'
    # #

    penta = PentaApi(email, api_key)
    df = read_csv(infile)
    df.drop_duplicates('input_textstring', inplace=True)

    alldata = predict(penta, df)
    # Store the results.
    with open(outfile, 'w') as writer:
        json.dump(alldata, writer)
    # Also store the updated dataframe.
    df.to_csv(outcsv)
    logging.debug('Done.')

 

Connecting the dots…

Here’s the entire script that you can simply copy and paste. It’s the union of the bits of code seen above, with the needed modules imported.

import time
import logging
import json

import pandas as pd

from pentadata_api import PentaApi

logging.basicConfig(format='%(asctime)s %(message)s', level=logging.DEBUG)


def merchant_sight(penta: PentaApi,
                   textstring: str,
                   city: str = None,
                   postal_code: str = None) -> dict:
    """Request to Merchant Sight API."""
    payload = {
        'textstring': textstring,
        'city': city,
        'postal_code': postal_code
    }
    url = 'https://api.pentadatainc.com/merchants/sight'
    response = penta.post(url, json=payload)
    if not response.status_code == 200:
        logging.error('Error >> %s', response.text)
    data = response.json()['sight']
    logging.debug('%s >> %s', textstring, data)
    return data


def predict(penta: PentaApi, df: pd.DataFrame) -> list:
    """
    Fill the data frame in place with the top results from the API.
    
    It also builds a json array with all I/O to/from the API,
    that is returned to the caller.
    """
    all_data = []
    for index, row in df.iterrows():
        textstring = row['input_textstring']
        try:
            sight = merchant_sight(penta, textstring)
        except:
            continue
        all_data.append({'input': textstring,
                         'sight': sight})
        # Put 1st only in the dataframe.
        if len(sight) > 0:
            df.loc[index]['output_name'] = sight[0]['name']
            df.loc[index]['output_address'] = sight[0]['address']
        time.sleep(1) # be gentle with our API!
    return all_data


if __name__ == '__main__':
    # Fill in the info
    # #
    outfile = './output.json'
    outcsv = './output.csv'
    email = 'YOUR-EMAIL'
    api_key = 'YOUR-API-KEY'
    infile = './input_data.csv'
    # #

    penta = PentaApi(email, api_key)
    df = read_csv(infile)
    df.drop_duplicates('input_textstring', inplace=True)

    alldata = predict(penta, df)
    # Store the results.
    with open(outfile, 'w') as writer:
        json.dump(alldata, writer)
    # Also store the updated dataframe.
    df.to_csv(outcsv)
    logging.debug('Done.')

 

How can we help?