Tools and Libraries for Computer Vision

Frameworks for Object Detection

Chinmay Wyawahare

Published in

Analytics Vidhya

7 min readJan 7, 2020

Computer Vision Libraries and Frameworks

TensorFlow:

A commonly used software for ML tasks is TF. It is widely adopted as it provides an interface to express common ML algorithms and executable code of the models. Models created in TF can be ported to heterogeneous systems with little or no change with devices ranging from mobile phones to distributed servers. TF was created by and is maintained by Google, and is used internally within the company for ML purposes.

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. It uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom-designed ASICs known as Tensor Processing Units (TPUs).

TensorFlow Lite:

During design, Google developed TF to be able to run on heterogeneous systems, including mobile devices. This was due to the problem of sending data back and forth between devices and data centres when computations could be executed on the device instead. TFM enabled developers to create interactive applications without the need of network round-trip delays for ML computations.

As ML tasks are computationally expensive, model optimisation is used to improve performance. The minimum hardware requirements of TFM in terms of Random Access Memory (RAM) size and CPU speed are low, and the primary bottleneck is the calculation speed of the computations as the desired latency for mobile applications is low.

TensorFlow Lite (TFL) is the evolution of TFM, which already supports deployment on mobile and embedded devices. As there is a trend to incorporate ML in mobile applications and as users have higher expectations on their mobile applications in terms of camera and voice. Some of the optimisations included in TFL are hardware acceleration through the silicon layer, frameworks such as the Android Neural Network API and mobile-optimised ANNs such as MobileNets and SqueezeNet. TF-trained models are converted to TF model automatically by TF.

CUDA and cuDNN:

CUDA was developed by NVIDIA to create an interface for parallel computing on CUDA-enabled GPUs. The platform functions as a software layer for general calculations that developers can utilise to execute virtual instructions, and support many programming languages.

The NVIDIA CUDA Deep Neural Network Library (cuDNN) enables GPU-accelerated training and inference of deep neural networks for common routines and operations in ML. As ML heavily depends on accessibility to computational power, this is crucial when training larger networks or training on high dimensional data such as images. The cuDNN library offers great support for low-level GPU performance tuning, enabling ML developers to focus on the implementation of the networks. cuDNN supports and accelerates operations in TensorFlow.

Caffe:

Caffe provides a complete toolkit for training, testing, fine tuning, and deploying models, with well-documented examples for all of these tasks. As such, it is an ideal starting point for researchers and other developers looking to jump into state-of-the-art machine learning. At the same time, it is likely the fastest available implementation of these algorithms, making it immediately useful for industrial deployment.

In Caffe, multimedia scientists and practitioners have an orderly and extensible toolkit for state-of-the-art deep learning algorithms, with reference models provided out of the box. Fast CUDA code and GPU computation fit industry needs by achieving processing speeds of more than 40 million images per day on a single K40 or Titan GPU. The same models can be run in CPU or GPU mode on a variety of hardware: Caffe separates the representation from the actual implementation, and seamless switching between heterogeneous platforms furthers development and deployment. Caffe can even be run in the cloud.

Theano:

Theano allows a user to symbolically define mathematical expressions and have them compiled in a highly optimized fashion either on CPUs or GPUs (the latter using CUDA), just by modifying a configuration flag. Furthermore, it can automatically compute symbolic differentiation of complex expressions, ignore the variables that are not required to compute the final output, reuse partial results to avoid redundant computations, apply mathematical simplifications, compute operations in place when possible to minimize the memory usage, and apply numerical stability optimization to overcome or minimize the error due to hardware approximations. To achieve this, the mathematical expressions defined by the user are stored as a graph of variables and operations, that is pruned and optimized at compilation time.

The interface to Theano is Python, a powerful and flexible language that allows for rapid prototyping and provides a fast and easy way to interact with the data. The downside of Python is its interpreter, that is in many cases a poor engine for executing mathematical calculations both in terms of memory usage and speed. Theano overcomes this limitation by exploiting the compactness and ductility of the Python language and combining them with a fast and optimized computation engine.

Keras:

Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Advantages of using Keras are:

Allows for easy and fast prototyping (through user friendliness, modularity, and extensibility)
Supports both convolutional networks and recurrent networks, as well as combinations of the two
Runs seamlessly on CPU and GPU

PyTorch:

PyTorch is a Python package that provides two high-level features:

Tensor computation (like NumPy) with strong GPU acceleration
Deep neural networks built on a tape-based autograd system

References:

[1] Oscar Alsing. (2018). Mobile Object Detection using TensorFlow Lite and Transfer Learning.

[2] Mart n Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro,Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser,Manjunath Kudlur, Josh Levenberg, Dan Man e, Rajat Monga, Sherry Moore, Derek Murray,Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar,Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Vi egas, Oriol Vinyals,Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zhen. (2015). TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems.

[3] Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Anger- mueller, Dzmitry Bahdanau, Nicolas Ballas, Frdric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Berg- eron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger Lewandowski, Xavier Bouthillier, Alexandre de Brbisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano, Tim Cooijmans, Marc Alexandre Ct, Myriam Ct, Aaron Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Mlanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian Goodfellow, Matt Graham, Caglar Gulcehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balzs Hidasi, Sina Honari, Arjun Jain, Sbastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, Csar Laurent, Sean Lee, Simon Lefrancois, Simon Lemieux, Nicholas Lonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert T. McGibbon, Roland Memisevic, Bart van Merrinboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, Franois Savard, Jan Schlter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, tienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jrmie Tanguay, Gijs van Tulder et al. (2016). Theano: A Python framework for fast computation of mathematical expressions.

[4] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev,Jonathan Long, Ross Girshick, Sergio Guadarrama, Trevor Darrell. (2014). Caffe: Convolutional Architecturefor Fast Feature Embedding.

[5] Chollet, F. (2015) Keras https://keras.io

[6] Paszke, Adam and Gross, Sam and Chintala, Soumith and Chanan, Gregory and Yang, Edward and DeVito, Zachary and Lin, Zeming and Desmaison, Alban and Antiga, Luca and Lerer, Adam. (2017). Automatic differentiation in PyTorch.

[7] John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. 2008. Scalable Parallel Programming with CUDA.

[8] https://github.com/tensorflow/tensorflow

[9] https://medium.com/datadriveninvestor/how-to-deploy-tensorflow-to-android-feat-tensorflow-lite-957a903be4d

[10] https://devblogs.nvidia.com/tensor-ops-made-easier-in-cudnn/

[11] https://commons.wikimedia.org/wiki/File:Theano_logo.svg

[12] https://www.exxactcorp.com/Caffe

[13] https://github.com/pytorch/pytorch