When data in datatable is small is count ( Upto certain thousand) and it contains many duplicate record in that case performance is ok for dataview.ToTable but when no. of row increases performance decrease and in case dataView.ToTable, it is noticeable decrease in performance.
Performance depend on total no of record and total of distinct record that dataset/datatable going to return.
What is the solution ?
If you are using .net framework 3.5 SP1 or later then Linq to DataTable comes to rescue. DataTable.AsEnumerable().Distinct() is much faster than DataView ToTable functionality.
Here is performance matrix on sample data. ( You can see code that i used to measure performance later in this post)
NoOfItem/Distinct Item | DataView ToTable Time (Milliseconds) | Linq Distinct Time (Milliseconds) |
100000/2000 | 13675 | 117 |
200000/2000 | 27271 | 231 |
100000/20000 | 176101 | 113 |
200000/20000 | 412053 | 483 |
100000/80000 | 777402 | 179 |
200000/80000 | 2249655 | 207 |
C# Code to find distinct using linq.
var items = dt.AsEnumerable().Distinct(System.Data.DataRowComparer.Default).ToList();
Code used to measure performance.
class Program { static System.Diagnostics.Stopwatch watch = new System.Diagnostics.Stopwatch(); static void Main(string[] args) { Console.WriteLine(String.Format("{0},{1},{2},{3},{4}", "NoOfItem","Distinctcountwithdataview", "Distinctcountwithdataviewtime",
"WithoutdataviewdistinctCount",
"Withoutdataviewdistinctcounttime")); for (int i = 0; i < 1000; i+=100) { DataTable dtSample = GenerateTable(1000 * i, 80000); var withDataView = NormalDataSetWithDataView(dtSample); var withoutDataView = NormalDataSetWithOutDataView(dtSample); Console.WriteLine(String.Format("{0},{1},{2},{3},{4}",1000*i,withDataView.Item1,withDataView.Item2,withoutDataView.Item1,withoutDataView.Item2)); } Console.ReadLine(); } private static Tuple<int,long> NormalDataSetWithDataView(DataTable dt) { StopWatchEnable(watch, true); DataView dv1 = new DataView(dt); DataTable dtNew1 = dv1.ToTable(true, "Id", "Name", "Role", "DataColumn1", "DataColumn2", "DataColumn3", "DataColumn4", "DataColumn5", "DataColumn6", "DataColumn7"); return new Tuple<int,long>(dtNew1.Rows.Count , StopWatchEnable(watch, false)); } private static Tuple<int, long> NormalDataSetWithOutDataView(DataTable dt) { StopWatchEnable(watch, true); var items = dt.AsEnumerable().Distinct(System.Data.DataRowComparer.Default).ToList(); return new Tuple<int, long>(items.Count, StopWatchEnable(watch, false)); } private static DataTable GenerateTable(int noOfRows, int modules) { DataTable dt = new DataTable(); dt.Columns.Add("Id", typeof(int)); dt.Columns.Add("Name", typeof(string)); dt.Columns.Add("Role", typeof(string)); dt.Columns.Add("DataColumn1", typeof(string)); dt.Columns.Add("DataColumn2", typeof(string)); dt.Columns.Add("DataColumn3", typeof(string)); dt.Columns.Add("DataColumn4", typeof(string)); dt.Columns.Add("DataColumn5", typeof(string)); dt.Columns.Add("DataColumn6", typeof(string)); dt.Columns.Add("DataColumn7", typeof(string)); for (int i = 0; i < noOfRows; i++) { int id = i % modules; dt.Rows.Add(new object[]
{ id,
"Test" + id.ToString(),
"Test" + id.ToString(),
"Test" + id.ToString(),
"Test" + id.ToString(),
"Test" + id.ToString(),
"Test" + id.ToString(),
"Test" + id.ToString(),
"Test" + id.ToString(),
"Test" + id.ToString() }); } return dt; } private static long StopWatchEnable(System.Diagnostics.Stopwatch watch, bool enable) { if (enable) { watch.Restart(); return 0; } else { watch.Stop(); return watch.ElapsedMilliseconds; } } }
Why DataView.ToTable is slow ?
- I found that ToTable functionality implemented generic functionality to compare each type of row and the way compare data and also it return new result set with distinct field.
Note:
Above code consider that need to find distinct from all column. If you want to find based on certain field then implement custom IEqualityComparer for datarow.I haven’t tested but PLinq might give some performance improvement in Linq operation.
Suggestion are must welcome.