Sunday, 5 March 2017

Introduction to R

One fine Sunday evening, I was having a casual conversation with my friend, Avinash. He is an R-certified trainer and is providing training for past 2 years. His main area of expertise is in Bio-medical field. He has also assisted/helped students pursuing their Ph.D. in medical research. Looking at his expertise, I have invited him to write some blogs on R here. Hope, we will see more quality content in future for R

Being a Microsoft stack developer, I never really took a keen interest in his technology stack. But then he told me about how Microsoft is supporting R since 2014 and that Microsoft has their own flavor of R.
Microsoft R Open, formerly known as Revolution R Open (RRO), is the enhanced distribution of R from Microsoft Corporation. It is a complete open source platform for statistical analysis and data science.                                                      -- excerpt from the website
So, after his half an hour lecture filled with inspiration, and a couple of hours of googling and youtube-ing, I decided to have at least basic idea about R. It sure looks like one of the top technology for data analysis in the market. Needs no extra settings. Download R-Studio or editor of your choice, and you are ready to get your hands dirty. More thorough articles will be published soon(hopefully by Avinash), But I would focus on the points which took me a while to run or the concepts which were difficult to grasp. Before we move ahead, let me warn you that R is not your regular programming language. It is very crisp and only for data analysis purpose(IMHO).

As R is for data analysis, very first thing you want to have "data" to work upon in your environment. Let's assume, we have a file called aapl.csv which contains daily closing prices of Apple's stock, Each line on the sheet contains a date and a closing price separated by a comma. The first line contains the column headings (Date and Close).to load CSV data in a variable. If we have your file in the current working directory, we can simply use this command.
read.csv - Reads a file in table format and creates a data frame from it, with cases corresponding to lines and variables to fields in the file. 
Output -

To know your working directory (and where you need to put the CSV file), try getwd()
getwd - getwd returns an absolute filepath representing the current working directory of the R process; setwd(dir) is used to set the working directory to dir.

To load files which are not in our working directory, use fully-qualified-file-path instead.


To know about any function in details, use ?FunctionName or help(FunctionName) . For example, to know about head() and tail(), we can use ?head or help(tail) in R-studio.
Returns the first or last parts of a vector, matrix, table, data frame or function. Since head() and tail() are generic functions, they may also have been extended to other classes.
Now that we have data with us, we can do various things and display data in various graphs, charts etc. Data visualization is one more awesome feature of R.
That's it for now. I guess, I will get stuck in R more often than ASP.Net and will have more to explore and write.

As always, feel free to provide your valuable suggestions and comments.

Saturday, 7 January 2017

Automate Excel Operations -2 - Deal with excel using Microsoft Interop


This is the second post of the series - "Automate Excel Operations".
If you wish to take look at previous article, here is the list -
  1. Read Excel Data into DataTable
In this post, we will create some of the functions to deal with excel.
But before jumping to that, We will create a class "InteropHelper.cs". This class will act as a wrapper to interact with actual Interop DLL provided by Microsoft.



Add some Interop properties in this class. This will help us to dispose of all the objects related to Interop DLL. It is very important to dispose the Interop objects properly or else we can get into some serious trouble. We will come to the dispose part a few moment later. These properties are just the one we needed for this post. We can add as many as we require.


And here comes the most important part IMO. Disposing of the objects. If we don't dispose of them properly, there will be an excel.exe running each time we open an excel file with Interop.So do whatever you want, don't forgot to dispose of all the properties you have declared in your code.


Below is the definition of ReleaseObject(). This is how we dispose of ComObject



So let's go with the first one.
1. CheckIfSheetExists - This function allows us to check if the desired exists in the excel file before doing any other operation. This function helps us to avoid the non-informational "HResult" exceptions getting thrown from Interop DLL, I will post the exact exception message once I get one. :-)


2. GetFirstSheetName - In many cases, while reading data from excel, if the user hasn't passed sheet name in function, or first sheet name is not fixed, Use this function to get the name before reading the excel.


3. GetLastRowColumnIndex - To get the value of last row/column where data is present in excel sheet. Note following points -
  • This function does not take into consideration hidden row/column in the sheet. 
  • We remove the filter before calculating the row/column to consider all the data.
  • Blank cells(with no data) are also taken into consideration. (Sometimes, we accidentally add blank rows/columns. For us, there is no data but excel consider these cells as actual data). Try removing unwanted cells if the count is not as expected.


4. DeleteSheets - To delete the sheets in excel file. Keep in mind that we can't just delete all the sheets in the excel. There should be at least one sheet available. That's why we have "forAll" and "defaultSheetName" parameters. Check description in the image below -


5. SetSheetVisibility - To show/hide the sheets in excel file. Keep in mind that we can't just hide all the sheets in the excel. There should be at least one sheet visible. That's why we have "forAll" and "defaultSheetName" parameters. Check description in the image below -