Generic Graph Lib in Delphi, Speed of ObjectList


we are developing an generic graph library using DELPHI XE7. The target graph size is ~ 1 MIO vertex nodes and ~ 10 MIO edges in that graph. The current state of our progress is a basic implementation of TGraph class and some basic graph algorithm!.

The question goes in the speed on inserting data to the vertex and edgelist with has been implemented as an TObjectList . I can only put here some fragmental code, but it might help to get the relevant points of my question

  TGraph<Tdata> = Class
  private
    /// <summary>
    /// TVertexList<Tdata>;
    /// </summary>
    _vertices: TObjectList<TVertex<Tdata>>;
    /// <summary>
    /// TEdgesList<TData>;
    /// </summary>
    _edges: TObjectList<TEdge<Tdata>>;

    .....
    end;


function TGraph<Tdata>.addVertex(u: TVertex<Tdata>): Integer;
var
  u_index: Integer;
begin


  if (not _vertices.Contains(u)) then
  begin
    u_index := _vertices.Count;
    u.vertex_index := u_index;
    Result := _vertices.Add(u);
  end;

end;

procedure TGraph<Tdata>.addEdge(e: TEdge<Tdata>;
  checkDuplicates: Boolean = true);
var
  i, j: Integer;
begin

  if (checkDuplicates) then
    if (self._edges.Contains(e)) then  exit;


  ///  normal add egde stuff 
  ....


end;

As shown in the code we use the TList contains function to check wheather certain elements already exist in the graph. Speed analysis shows that the algorithm has a complexity of n2 which is fatal for us.

The questions goes now like that

  • a) is speed of list.contains(...) performance in Delphi really n² ?

    b) Any better function to check wheather an element exists or not?

    c) Instead of generic Object List can we change to any other more faster data type for building generic graphs?

    d) I know C++ Boost or METIS/HMETIS Graph libs, but I can#t not write on a short timeframe evaluation programs for these lib. Do they also suffer that N² problem on building large graphs?

drawing

enter image description here

this graph shows checkDuplicates = false as green curve and checkDuplicates = true condition as red curve for addegde...) .


Answers:


Is TObjectList<T>.Contains() really O(n2)?

No, it is O(n). The implementation of Contains() is simply a linear search. The graph you show is not linear, but then we don't know how you generated it. Presumably your code doesn't test purely the performance of Contains().

My guess is that you are measuring the population of your list. For your code that will be O(n2). Loosely speaking, adding items is O(n), and checking if they are in the list is also O(n) due to your choice of container, i.e. that you have to perform linear search on each add to the list. So put that together and you have O(n2) performance.

Any better function to check whether an element exists or not?

Not if your data is held in this type of container. A TObjectList<T> is a random access array in performance terms. You would need to choose a different container if you wanted better than O(n) search performance. For instance a TDictionary<K,V> would give you O(1) search performance.

Instead of TObjectList<T> can we change to any other more faster data type for building generic graphs?

See above.

I know C++ Boost or METIS/HMETIS Graph libs, but I can't write on a short timeframe evaluation programs for these lib. Do they also suffer that O(n2) problem on building large graphs?

I don't know. As I said, we don't know where your graph comes from so we can't actually tell what you are measuring. However, any decent library in this field is surely not going to perform a full linear search over all edges before adding a new one.