Stride AI

Tech Used: Python, OpenCV, NLTK, Pandas, Numpy, PDF Parsing, Tesseract, Tensorflow

I interned at Stride AI between June 2018 to August 2018. Stride AI is a Techstars funded company that deals with automating document heavy tasks in the BFSI space.

Stride.AI is a company that operates in the RPA (Robotic Process Automation) space, primarily catering to the needs of financial industries like banks and credit rating agencies.

Robotic Process Automation helps companies from 2 fronts:

Most of Stride.AI’s automation work flow revolves around the analysis of Unstructured Financial Documents, usually in the form of a PDF, with the aim of building SaaS and PaaS solutions for automated data extraction/localization and NLU(Natural Language Understanding) models/Pattern based heuristics for Natural Language Querying.

Stride.AI’s primary clients include a multitude of International Banks headquartered in Brazil, Europe, Bangalore, Japan etc.

The company places emphasis on research, especially in the fields of Data Scarce Natural Language Processing and Document Data Analysis.

I was a part of the Computer Vision team, where my day to day tasks included the development of a scalable, ML based PDF Parsing library component that could be used across all internal projects within the company and building a parsing pipeline to identify key datapoints out of French and Dutch KYC Documents using Tensorflow Object Detection and Tesseract OCR.

In Short, It was a fantastic experience! I learnt how production ML systems were built and maintained for scalability. I also learnt a lot about how documents are stored and represented in the LInux file system, and how one could mine data out of the same as efficiently as possible.

As part of my universities internship requirement, I had to write up a research paper regarding the same. It can be found here