CSV to JSON Converter

Hi everyone!

Labelbox will be deprecating users’ ability to upload CSV formatted files in favor of JSON formatted uploads. Although CSV uploads will no longer be supported, we have created the following script to help convert CSV files to JSON files for upload into Labelbox:

"""csv_to_json.py

Script that takes a csv file and converts it into the Labelbox json syntax to-be-uploaded. Will save the JSON in the same directory as this python function.

How to Run:
python3 csv_to_json.py -csv_file_name "" -json_file_name "" -row_data_column "" -external_id_column ""

Args:
csv_file_name (str)       :       From the directory you're in, type out the path to the file that is the CSV to-be-converted
json_file_name (str)      :       From the directory you're in, type out the path to where you want the json saved / what you want the name to be
row_data_column (str)     :       The column name for the column that contains the row data to-be-uploaded
external_id_column (str)  :       The column name for the column that contains the external ID of the data row
Returns:
Downloads the newly created json file to the json_file_name
"""

from labelbox import Client
import pandas as pd
import argparse
import json

csv_file_name = ""
json_file_name = ""
row_data_column = ""
external_id_column = ""

def csv_to_json(csv_file_name: str, json_file_name: str, row_data_column="row_data", external_id_column="external_id"):
    csv_file = pd.read_csv(csv_file_name)
    json_list = []
    for index, row in csv_file.iterrows():
        json_list.append({"row_data" : row[row_data_column], "external_id" : row[external_id_column]})
    with open(json_file_name, 'w') as f:
        json.dump(json_list, f)
    print(f"Converted csv {csv_file_name} to json, saved at {json_file_name}")

if __name__ == "__main__":
  argparser = argparse.ArgumentParser()
  argparser.add_argument("-csv_file_name", type=str)
  argparser.add_argument("-json_file_name", type=str)
  argparser.add_argument("-row_data_column", type=str)
  argparser.add_argument("-external_id_column", type=str)
  args = argparser.parse_args()
  csv_file_name = args.csv_file_name
  json_file_name = args.json_file_name
  row_data_column = args.row_data_column
  external_id_column = args.external_id_column
  csv_to_json(csv_file_name, json_file_name, row_data_column, external_id_column)`

This script will take the user’s CSV file and convert it into a JSON file to be uploaded into Labelbox. The converter takes 4 arguments that the user will provide:

  1. csv_file_name: the CSV file path that will be converted into the JSON file

  2. json_file_name: the path where the JSON file will be saved / the (user’s selected) name of the new JSON file

  3. row_data_column: the CSV column name containing the row data to be uploaded

  4. external_id_column: the CSV column name containing the external ID of the data row

The csv_to_json function takes in these four arguments and returns the new JSON file. After reading in the user’s CSV file, this function iterates through each row of the CSV and adds the contents of the row data and external ID columns as a dictionary to a list. Each row in the CSV becomes an element in the list, and this list is then converted into the JSON file that will be uploaded into Labelbox.

Steps to Use:

  1. Save the .py file locally

  2. Open terminal - use command line functions to choose your working directory
    a. For example - in a Mac terminal, you choose your directory using the cd command

  3. Run the following command line after inputting arguments between quotes:
    python3 path_to_py_file/csv_to_json.py -csv_file_name "" -json_file_name "" -row_data_column "" -external_id_column ""

Please reach out to Labelbox Support with any questions or issues.

Steven
Labelbox

Adding to that depending on the content of your .csv file you might run into conversion issue :


to alleviate you can add encoding to the csv_file variable :

csv_file = pd.read_csv(csv_file_name, encoding='Latin-1')