Showing posts with label Linq. Show all posts
Showing posts with label Linq. Show all posts

Tuesday, May 1, 2012

DataView ToTable Vs Linq to find distinct

One of my past post i mention that how to find distinct from DataTable (http://dotnetstep.in/find-distinct-value-from-dataset-datatable-in-net). Most of the time need is simple and we need to find out one or two column for distinct column but when need is complex and data is not unique than all columns are used to find distinct value of datatable.

When data in datatable is small is count ( Upto certain thousand) and it contains many duplicate record in that case performance is ok for dataview.ToTable but when no. of row increases performance decrease and in case dataView.ToTable, it is noticeable decrease in performance.
Performance depend on total no of record and total of distinct record that dataset/datatable going to return.

What is the solution ?
If you are using .net framework 3.5 SP1 or later then Linq to DataTable comes to rescue. DataTable.AsEnumerable().Distinct() is much faster than DataView ToTable functionality.
Here is performance matrix on sample data. ( You can see code that i used to measure performance later in this post)
NoOfItem/Distinct Item DataView ToTable Time (Milliseconds) Linq Distinct Time (Milliseconds)
100000/2000 13675 117
200000/2000 27271 231
100000/20000 176101 113
200000/20000 412053 483
100000/80000 777402 179
200000/80000 2249655 207
Seeing above result it is clear that Linq gives much better performance.

C# Code to find distinct using linq.
var items = dt.AsEnumerable().Distinct(System.Data.DataRowComparer.Default).ToList();   

Code used to measure performance.
class Program
    {        
        static System.Diagnostics.Stopwatch watch = new System.Diagnostics.Stopwatch();
        static void Main(string[] args)
        {
            Console.WriteLine(String.Format("{0},{1},{2},{3},{4}", "NoOfItem","Distinctcountwithdataview", "Distinctcountwithdataviewtime", 
                                                                   "WithoutdataviewdistinctCount", 
                                                                   "Withoutdataviewdistinctcounttime"));
            for (int i = 0; i < 1000; i+=100)
            {
                DataTable dtSample = GenerateTable(1000 * i, 80000);
                var withDataView = NormalDataSetWithDataView(dtSample);               
                var withoutDataView = NormalDataSetWithOutDataView(dtSample);
                Console.WriteLine(String.Format("{0},{1},{2},{3},{4}",1000*i,withDataView.Item1,withDataView.Item2,withoutDataView.Item1,withoutDataView.Item2));                
            }
            
            Console.ReadLine();
        }

        private static Tuple<int,long> NormalDataSetWithDataView(DataTable dt)
        {
            StopWatchEnable(watch, true);            
            DataView dv1 = new DataView(dt);
            DataTable dtNew1 = dv1.ToTable(true, "Id", "Name", "Role", "DataColumn1", "DataColumn2", "DataColumn3", "DataColumn4", "DataColumn5", "DataColumn6", "DataColumn7");           
            return new Tuple<int,long>(dtNew1.Rows.Count , StopWatchEnable(watch, false));
        }
        private static Tuple<int, long> NormalDataSetWithOutDataView(DataTable dt)
        {
            StopWatchEnable(watch, true);            
            var items = dt.AsEnumerable().Distinct(System.Data.DataRowComparer.Default).ToList();            
            return new Tuple<int, long>(items.Count, StopWatchEnable(watch, false));
        }

        private static DataTable GenerateTable(int noOfRows, int modules)
        {
            DataTable dt = new DataTable();
            dt.Columns.Add("Id", typeof(int));
            dt.Columns.Add("Name", typeof(string));
            dt.Columns.Add("Role", typeof(string));
            dt.Columns.Add("DataColumn1", typeof(string));
            dt.Columns.Add("DataColumn2", typeof(string));
            dt.Columns.Add("DataColumn3", typeof(string));
            dt.Columns.Add("DataColumn4", typeof(string));
            dt.Columns.Add("DataColumn5", typeof(string));
            dt.Columns.Add("DataColumn6", typeof(string));
            dt.Columns.Add("DataColumn7", typeof(string));
            for (int i = 0; i < noOfRows; i++)
            {
                int id = i % modules;
                dt.Rows.Add(new object[]
                              { id, 
                                "Test" + id.ToString(), 
                                "Test" + id.ToString(), 
                                "Test" + id.ToString(), 
                                "Test" + id.ToString(), 
                                "Test" + id.ToString(), 
                                "Test" + id.ToString(), 
                                "Test" + id.ToString(), 
                                "Test" + id.ToString(), 
                                "Test" + id.ToString() });
            }
            return dt;
        }

        private static long StopWatchEnable(System.Diagnostics.Stopwatch watch, bool enable)
        {
            if (enable)
            {
                watch.Restart();
                return 0;
            }
            else
            {
                watch.Stop();
                return watch.ElapsedMilliseconds;
            }
        }
    }



Why DataView.ToTable is slow ?
- I found that ToTable functionality implemented generic functionality to compare each type of row and the way compare data and also it return new result set with distinct field.

Note:
Above code consider that need to find distinct from all column. If you want to find based on certain field then implement custom IEqualityComparer for datarow.I haven’t tested but PLinq might give some performance improvement in Linq operation.

Suggestion are must welcome.

Thursday, May 14, 2009

DataTable / DataSet and Linq

1. Querying DataTable using Linq 
    Here i am going to explain how you can use linq to get data from DataTable.
    You need to use AsEnumerable() method of DataTable object to query datatable using linq.
Example :  
DataTable dt = new DataTable();
dt.Columns.Add("ID");
dt.Columns.Add("Name");
DataRow dr = dt.NewRow();
dr["ID"] = "1";
dr["Name"] = "Test1";
dt.Rows.Add(dr);
dr = dt.NewRow();
dr["ID"] = "2";
dr["Name"] = "Test2";
dt.Rows.Add(dr);

var result = from user in dt.AsEnumerable()
                     where user.Field<string>("ID") == "1"
                     select user;


2. Join Two DataTable using Linq.
     Example : Here i take two datatable.

        DataTable dt = new DataTable();
        dt.Columns.Add("ID");
        dt.Columns.Add("Name");
        DataRow dr = dt.NewRow();
        dr["ID"] = "1";
        dr["Name"] = "Test1";
        dt.Rows.Add(dr);
        dr = dt.NewRow();
        dr["ID"] = "2";
        dr["Name"] = "Test2";
        dt.Rows.Add(dr);

        DataTable dt1 = new DataTable();
        dt1.Columns.Add("ID");
        dt1.Columns.Add("Product");
        dr = dt1.NewRow();
        dr["ID"] = "1";
        dr["Product"] = "Test-Product";
        dt1.Rows.Add(dr);

Linq Query:

var c = from p in dt.AsEnumerable()
join d in dt1.AsEnumerable() on p.Field<string>("ID") equals d.Field<string>("ID")
select new { ID = p.Field<string>("ID"), Name = p.Field<string>("Name"), Product = d.Field<string>("Product") };

3. Except and Intersect operation between two DataTable using Linq .

        DataTable dt = new DataTable();
        dt.Columns.Add("ID");
        dt.Columns.Add("Name");
        DataRow dr = dt.NewRow();
        dr["ID"] = "1";
        dr["Name"] = "Test1";
        dt.Rows.Add(dr);
        dr = dt.NewRow();
        dr["ID"] = "2";
        dr["Name"] = "Test2";
        dt.Rows.Add(dr);

        DataTable dt1 = new DataTable();
        dt1.Columns.Add("ID");
        dt1.Columns.Add("Name");
        dr = dt1.NewRow();
        dr["ID"] = "1";
        dr["Name"] = "Test1";
        dt1.Rows.Add(dr);

        var  exceptresult = dt.AsEnumerable().Except(dt1.AsEnumerable(), System.Data.DataRowComparer.Default);

         var intersectresult  = dt.AsEnumerable().Intersect(dt1.AsEnumerable(), System.Data.DataRowComparer.Default);

For both of above operation you need to specify DataRowComparer . System.Data.DataRowComparer is a static sealed class and its static property Default contain comparer reference.  You must have to specify comparer for this operation on Reference datatype.

Next blog i will give you CustomDataRow Comparer to make it more efficient.

Sunday, January 18, 2009

XLinq To Generate Batch Element For SharePoint ListService

In my article (http://dotnetstep.blogspot.com/2009/01/update-datetime-column-using-webservice.html) i used SharePoint List Service to update SharePoint ListItems. In that article i manually set batchElement.InnerXml property with Xml.

When there is to many row and you want to generate it automatically then you can use XLinq.

In following example first select all items from orders List then use item id and update each item orderdate and orderdatetime column with current date and current datetime respectively.

Lists lst = new Lists();
lst.Url = “http://<your site>/_vti_bin/lists.asmx”;
lst.Credentials = new System.Net.NetworkCredential("test", "test"); // Site Administrator username and password
XmlNode resultNode = lst.GetListItems("Orders", String.Empty, null, null, int.MaxValue.ToString(), null, String.Empty);

// Use XElement To Generate Batch Element Automatically

XElement orders = XElement.Parse(resultNode.OuterXml);
XName name = XName.Get("data","urn:schemas-microsoft-com:rowset");
int methodsequence=1;
XElement updatexml =
new XElement("Batch", new XAttribute("OnError","Continue"),
from order in orders.Element(name).Elements()
// you can also use where condition over here
// for eg. where order.Attribure(“ows_ID”).value = “2”

select new  XElement("Method", 
new XAttribute("ID", (methodsequence++).ToString()), 
new XAttribute("Cmd", "Update"),
new XElement("Field", new XAttribute("Name", "ID"), order.Attribute("ows_ID").Value),
new XElement("Field",new XAttribute("Name","OrderDate"),DateTime.Now.SharepointFormatDate()),
new XElement("Field", new XAttribute("Name", "OrderDateTime"), DateTime.Now.SharepointFormatDateTime())));


Now Convert XElement to XmlNode as UpdateListItems only accept XmlNode as parameter.

XmlDocument doc=  new XmlDocument();
doc.LoadXml(updatexml.ToString());
lst.UpdateListItems("Orders",doc.FirstChild);

Sunday, January 11, 2009

SharePoint to Linq

Here specially talking about SharePoint and XLinq. When you choose SharePoint webservice to get data from SharePoint List, data return by web service is in XML format. To read this data as well as filter data XLinq is more powerful.

How to Use SharePoint webservice.
1. Add Web reference or generate proxy using WSDL tool.
then

Lists lst = new Lists();
lst.Url = “http://<your site>/_vti_bin/lists.asmx”;
lst.Credentials = new System.Net.NetworkCredential("test", "test");

Now use GetListItems to retrieve all items from SharePoint List. I would suggest take scenario into consideration. If you require less data to be return from SharePoint List then use Query in GetListItems function. ( Here concentration is on XLinq so it is out of scope for this post).

XmlNode resultNode = lst.GetListItems("Customers", String.Empty, null, null, int.MaxValue.ToString(), null, String.Empty);

XElement (System.Xml.Linq) to read result into XElement
XElement customers = XElement.Parse(resultNode.OuterXml);

XName to read type of Node.
XName name = XName.Get("data","urn:schemas-microsoft-com:rowset");

Read All Node Using Linq
var filtercustomers = from ele in customers.Element(name).Elements()                                
select new {CustomerID =  ele.Attribute("ows_ID").Value , Name = ele.Attribute("ows_CustomerName").Value , City = ele.Attribute("ows_CustomerCity").Value , Country = ele.Attribute("ows_CustomerCountry").Value};

Filtering Node using XLinq
var filtercustomers = from ele in customers.Element(name).Elements()   
where ele.Attribute("ows_CustomerCountry").Value == "India" && ele.Attribute("ows_CustomerCity").Value == "Banglore"                             
select new {CustomerID =  ele.Attribute("ows_ID").Value , Name = ele.Attribute("ows_CustomerName").Value , City = ele.Attribute("ows_CustomerCity").Value , Country = ele.Attribute("ows_CustomerCountry").Value};

Join Using XLinq

1. Join Two Result
This is use full case when join in needed on filter data set.
In following code customers and orders represent two sample reslut set.

XmlNode resultNode = lst.GetListItems("Customers", String.Empty, null, null, int.MaxValue.ToString(), null, String.Empty);
            XElement customers = XElement.Parse(resultNode.OuterXml);
            XName name = XName.Get("data","urn:schemas-microsoft-com:rowset");

            var filtercustomers = from ele in customers.Element(name).Elements()         
where ele.Attribute("ows_CustomerCountry").Value == "India" && ele.Attribute("ows_CustomerCity").Value == "Banglore"                       
select new {CustomerID =  ele.Attribute("ows_ID").Value , Name = ele.Attribute("ows_CustomerName").Value , City = ele.Attribute("ows_CustomerCity").Value , Country = ele.Attribute("ows_CustomerCountry").Value};

            // Retrive data From Orders
            resultNode = lst.GetListItems("Orders", String.Empty, null, null, int.MaxValue.ToString(), null, String.Empty);
            XElement orders = XElement.Parse(resultNode.OuterXml);
            var filterorders = from ele in orders.Element(name).Elements()                                 
                                  select new { OrderID = ele.Attribute("ows_ID").Value, CustomerID = ele.Attribute("ows_CustomerID").Value.Split(new string[]{";#"},StringSplitOptions.None)[0] , OrderDate = ele.Attribute("ows_OrderDate").Value };

Join Results

var joinresult = from customerele in filtercustomers
                     join orderele in filterorders on customerele.CustomerID equals orderele.CustomerID
                     select new { customerele, orderele };

2. Directly Join Two XElement

var directjoin = from customerele in customers.Element(name).Elements()
                             join orderele in orders.Elements(name).Elements() on customerele.Attribute("ows_ID").Value equals orderele.Attribute("ows_CustomerID").Value.Split(new string[] { ";#" }, StringSplitOptions.None)[0]
                             select new { CustomerID = customerele.Attribute("ows_ID").Value, Name = customerele.Attribute("ows_CustomerName").Value, City = customerele.Attribute("ows_CustomerCity").Value, Country = customerele.Attribute("ows_CustomerCountry").Value, OrderID = orderele.Attribute("ows_ID").Value, OrderDate = orderele.Attribute("ows_OrderDate").Value };

Note: When working with webservice do care while retrieve data. Whatever data needed from single list for operation try to retrieve in single webservice call instead of making many web service call.For my point of view this will improve the performance.

Tuesday, December 30, 2008

Linq to SharePoint List. (Where , Join )

This article give you information regarding how to use Liinq with SharePoint List to fetch data from SPList.

For Example there are three SharePoint List . Customers , Orders , OrderDetails. (Please See image below). In Orders list CustomerID is lookup field and in OrderDetails list OrderID is lookup field.

Customer

orders

Order Details

Now Get All Customer From Customer List using Linq.

SPSite site = new SPSite("http://localhost:45833"); 
SPWeb web = site.OpenWeb();
SPList lstCustomer = web.Lists["Customers"];

var customers = from SPListItem customer in lstCustomer.Items
                       select customer;

by use of SPListItem (As we use select customer in Linq query)
foreach (SPListItem item in customers)
{
Console.WriteLine(item.ID.ToString() + "-" +item["CustomerName"].ToString());
}

or by using ananymous type

var customers = from SPListItem customer in lstCustomer.Items
                       select new {ID = customer.ID.ToString(), Name= customer[“CustomerName”].ToString()};
foreach (var item in customers)
{
Console.WriteLine(item.ID + "-" + item.Name);
}

use of where clause with this

var customers = from SPListItem customer in lstCustomer.Items
                       where customer.ID == 1
                       select new { ID = customer.ID.ToString(), Name = customer["CustomerName"].ToString() };

Join Three List to get Related Data

SPList lstCustomer = web.Lists["Customers"];
SPList lstorders = web.Lists["Orders"];
SPList lstorderdetails = web.Lists["OrderDetails"];
var customerorders =

from SPListItem customer in lstCustomer.Items
join SPListItem order in lstorders.Items on customer.ID.ToString() equals order["CustomerID"].ToString().Split(new string[]{";#"},StringSplitOptions.None)[0]
join SPListItem orderdetail in lstorderdetails.Items on order.ID.ToString() equals orderdetail["OrderID"].ToString().Split(new string[] { ";#" }, StringSplitOptions.None)[0]                                
select new {Product = orderdetail["Product"].ToString() , Price = Convert.ToDouble(orderdetail["Price"].ToString()) , Qty = Convert.ToInt32(orderdetail["Qty"].ToString()) , OrderID = order.ID , OrderDate = order["OrderDate"].ToString() , CustomerID = customer.ID.ToString(), CustomerName = customer["CustomerName"].ToString() };


foreach (var item in customerorders)
               {
                   Console.WriteLine(item.Product + "  " + item.Price.ToString() + "," + item.Qty.ToString() + ","+ (item.Price * item.Qty).ToString() + ","+ item.OrderID.ToString() + "," + item.OrderDate + "," + item.CustomerName);
               }

                  Note :In above code, I used split operation on string in order to compare LookUp Field ID Value with ID of parent List. There is also posibility to join more than three tables.

For that you have to use nested this join inside other. Please give your comment or advice on this.