Quantcast
Channel: Supreme Debugging
Viewing all articles
Browse latest Browse all 13

Numpy array append and performance issues

$
0
0

Issue

Assume you need to add 100 elements in the numpy array, the initial intuition is to run-down the loop construct and append the element to the array.

However, this would be a grave mistake in-terms of extension-ability of the code. Assume that the elements changes from 100 to 50000 - what would you expect? Let's see with below example


Comparing numpy append vs list append 

Check out this example (link) which run's through adding elements 
  1. directly to numpy array - using the append api provided by numpy
  2. indirectly create the numpy array - using a list (to construct the information) and converting the list to array using numpy array api.
We see that the Option(2) is much more efficient compared to Option(1). And more the number of elements to add to numpy array, more efficient is the option (2).




Result Analysis

Let's look at the sample results obtained in the run


Num of elements

 Time taken (in sec) to directly add in numpy (A)

Time taken (in sec) to add to list and then convert to numpy (B)

Approach A is slower compared to B by

10000

0.112673044205

0.00116991996765

96 times

20000

0.218866825104

0.00390100479126

56 times

30000

0.721004962921

0.00337600708008

213 times

40000

1.38639616966

0.00489687919617

283 times

50000

2.46702504158

0.00590395927429

417 times


So we see that the as the number of elements to insert increases, the factor by which the insert takes time increases drastically (as depicted by below graph)



Reason

Now, lets look at the api details of numpy append. When we look into the details - we get a small note - 

Note that append does not occur in-place: a new array is allocated and filled. If axis is None, out is a flattened array.
So, unlike the python list, the numpy array is not a linked list implementation & every time we add a element, it copies all existing contents of the array with an additional element space - before inserting. Which is a very very costly operation.

So when you need to build an numpy array, always build it first with python list and then convert it to numpy array.



Viewing all articles
Browse latest Browse all 13

Latest Images

Trending Articles





Latest Images