One of my past post i mention that how to find distinct from DataTable (http://dotnetstep.in/find-distinct-value-from-dataset-datatable-in-net). Most of the time need is simple and we need to find out one or two column for distinct column but when need is complex and data is not unique than all columns are used to find distinct value of datatable.
When data in datatable is small is count ( Upto certain thousand) and it contains many duplicate record in that case performance is ok for dataview.ToTable but when no. of row increases performance decrease and in case dataView.ToTable, it is noticeable decrease in performance.
Performance depend on total no of record and total of distinct record that dataset/datatable going to return.
What is the solution ?
If you are using .net framework 3.5 SP1 or later then Linq to DataTable comes to rescue. DataTable.AsEnumerable().Distinct() is much faster than DataView ToTable functionality.
Here is performance matrix on sample data. ( You can see code that i used to measure performance later in this post)
Seeing above result it is clear that Linq gives much better performance.
C# Code to find distinct using linq.
Code used to measure performance.
Why DataView.ToTable is slow ?
- I found that ToTable functionality implemented generic functionality to compare each type of row and the way compare data and also it return new result set with distinct field.
Note:
Above code consider that need to find distinct from all column. If you want to find based on certain field then implement custom IEqualityComparer for datarow.I haven’t tested but PLinq might give some performance improvement in Linq operation.
Suggestion are must welcome.
When data in datatable is small is count ( Upto certain thousand) and it contains many duplicate record in that case performance is ok for dataview.ToTable but when no. of row increases performance decrease and in case dataView.ToTable, it is noticeable decrease in performance.
Performance depend on total no of record and total of distinct record that dataset/datatable going to return.
What is the solution ?
If you are using .net framework 3.5 SP1 or later then Linq to DataTable comes to rescue. DataTable.AsEnumerable().Distinct() is much faster than DataView ToTable functionality.
Here is performance matrix on sample data. ( You can see code that i used to measure performance later in this post)
NoOfItem/Distinct Item | DataView ToTable Time (Milliseconds) | Linq Distinct Time (Milliseconds) |
100000/2000 | 13675 | 117 |
200000/2000 | 27271 | 231 |
100000/20000 | 176101 | 113 |
200000/20000 | 412053 | 483 |
100000/80000 | 777402 | 179 |
200000/80000 | 2249655 | 207 |
C# Code to find distinct using linq.
var items = dt.AsEnumerable().Distinct(System.Data.DataRowComparer.Default).ToList();
Code used to measure performance.
class Program { static System.Diagnostics.Stopwatch watch = new System.Diagnostics.Stopwatch(); static void Main(string[] args) { Console.WriteLine(String.Format("{0},{1},{2},{3},{4}", "NoOfItem","Distinctcountwithdataview", "Distinctcountwithdataviewtime",
"WithoutdataviewdistinctCount",
"Withoutdataviewdistinctcounttime")); for (int i = 0; i < 1000; i+=100) { DataTable dtSample = GenerateTable(1000 * i, 80000); var withDataView = NormalDataSetWithDataView(dtSample); var withoutDataView = NormalDataSetWithOutDataView(dtSample); Console.WriteLine(String.Format("{0},{1},{2},{3},{4}",1000*i,withDataView.Item1,withDataView.Item2,withoutDataView.Item1,withoutDataView.Item2)); } Console.ReadLine(); } private static Tuple<int,long> NormalDataSetWithDataView(DataTable dt) { StopWatchEnable(watch, true); DataView dv1 = new DataView(dt); DataTable dtNew1 = dv1.ToTable(true, "Id", "Name", "Role", "DataColumn1", "DataColumn2", "DataColumn3", "DataColumn4", "DataColumn5", "DataColumn6", "DataColumn7"); return new Tuple<int,long>(dtNew1.Rows.Count , StopWatchEnable(watch, false)); } private static Tuple<int, long> NormalDataSetWithOutDataView(DataTable dt) { StopWatchEnable(watch, true); var items = dt.AsEnumerable().Distinct(System.Data.DataRowComparer.Default).ToList(); return new Tuple<int, long>(items.Count, StopWatchEnable(watch, false)); } private static DataTable GenerateTable(int noOfRows, int modules) { DataTable dt = new DataTable(); dt.Columns.Add("Id", typeof(int)); dt.Columns.Add("Name", typeof(string)); dt.Columns.Add("Role", typeof(string)); dt.Columns.Add("DataColumn1", typeof(string)); dt.Columns.Add("DataColumn2", typeof(string)); dt.Columns.Add("DataColumn3", typeof(string)); dt.Columns.Add("DataColumn4", typeof(string)); dt.Columns.Add("DataColumn5", typeof(string)); dt.Columns.Add("DataColumn6", typeof(string)); dt.Columns.Add("DataColumn7", typeof(string)); for (int i = 0; i < noOfRows; i++) { int id = i % modules; dt.Rows.Add(new object[]
{ id,
"Test" + id.ToString(),
"Test" + id.ToString(),
"Test" + id.ToString(),
"Test" + id.ToString(),
"Test" + id.ToString(),
"Test" + id.ToString(),
"Test" + id.ToString(),
"Test" + id.ToString(),
"Test" + id.ToString() }); } return dt; } private static long StopWatchEnable(System.Diagnostics.Stopwatch watch, bool enable) { if (enable) { watch.Restart(); return 0; } else { watch.Stop(); return watch.ElapsedMilliseconds; } } }
Why DataView.ToTable is slow ?
- I found that ToTable functionality implemented generic functionality to compare each type of row and the way compare data and also it return new result set with distinct field.
Note:
Above code consider that need to find distinct from all column. If you want to find based on certain field then implement custom IEqualityComparer for datarow.I haven’t tested but PLinq might give some performance improvement in Linq operation.
Suggestion are must welcome.
No comments:
Post a Comment