DesignAndCode.net

thoughts on Design, code, code management, database design and more

Computer Hard Drive Law - Not a matter of if - but a matter of When..

Recently - I went to boot up my home workstation - where I started the restructure of my blog site; It asked me to enter a recovery disk - which I never created.

My boot drive had been a 1TB SSD and based on my notes - it had died in 3 years.

I have an MSDN license and thought I just need to download the ISO image and rebuild it with a new hard drive.

The latest image Windows 10 1903 - had notes you had to start with an earlier version -which I ignored.

I found that the 1903 image was now over the 4.7GB standard DVD size - and it looked like I needed to buy some Dual Layer DVD writable disks. 20 disks at $34 and a new 500GB SSD for $44 felt like I had a good start.

The write to the DUAL LAYER disk looked like it worked - but when I put the DVD into the drive for install - it was not a boot-able image. I thought - maybe I can find instructions on making a bootable USB drive; and as the installs were going into that USB stick - it occurred to me - when I built the box 3 years ago - it was WIndows 10 and had a bootable image; I dug out a box from storage - and the disk worked and I had the printed license in the boox.

You forget how much goes into building a computer - until you have to build it again.

Windows came up - but no network. I had to use my personal laptop to download drivers for Network, Audio, Chipsets. I had to go to the Video Card Manufacturer and get their audio/video drivers. 

Once I had a network connection - I went to the windows update - and let it run through 3 years of updates - which lasted into the night.

In the morning the WIndows version had reached 1903 and I was able to start re-installing software that I had to install.

I realized that one of my drives had not been recognized - and had to go to Disk Manager to make is visible.

I had a few years of files on that drive that I was afraid I had lost.

I did lose my photo images - I have a catalog backup - with thumbnail images of the original photo.

If I find a window of time to locate some backup disks with the photo images - I may be able to rebuild that.

Losing years of cataloging and indexing efforts of those photographs - and perhaps not being able to recover them, is making me take a look at other options.

I may find other things missing - which may surface around doing my taxes next year.


You just never expect the hard disk crash - but you do need to be better prepared. (Which I should have been.)


Project Structure on Disk for .Net CORE

In looking at a starting .Net Core project - the starting point does not come with a "Structure"; VIsual Studio projects usually establish a starting project structure, which feels familiar.

I found David Fowler (works on ASP.NET core team), in 2014, suggested a structure.


$/
  artifacts/
  build/
  docs/
  lib/
  packages/
  samples/
  src/
  tests/
  .editorconfig
  .gitignore
  .gitattributes
  build.cmd
  build.sh
  LICENSE
  NuGet.Config
  README.md
  {solution}.sln
  • src - Main projects (the product code)
  • tests - Test projects
  • docs - Documentation stuff, markdown files, help files etc.
  • samples (optional) - Sample projects
  • lib - Things that can NEVER exist in a nuget package
  • artifacts - Build outputs go here. Doing a build.cmd/build.sh generates artifacts here (nupkgs, dlls, pdbs, etc.)
  • packages - NuGet packages
  • build - Build customizations (custom msbuild files/psake/fake/albacore/etc) scripts
  • build.cmd - Bootstrap the build for windows
  • build.sh - Bootstrap the build for *nix
  • global.json - ASP.NET vNext only

.gitignore

[Oo]bj/
[Bb]in/
.nuget/
_ReSharper.*
packages/
artifacts/
*.user
*.suo
*.userprefs
*DS_Store
*.sln.ide

Reaching into the TPL

At my current employer, we have a series of Windows Services that handle receiving notifications of external system changes(Adds,Updates,Deletes), parsing those notifications to store in database table structures, and then publishing the notifications internally for consumption by applications within the enterprise.

The first of those services is using IBM MQ Messaging, and was written to use Event Based notifications, which have multiple Tasks listening for events from MQ. This type of mechanism effectively hook into the MQ Broker, and provides a callback into the connection, by using an AutoResetEvent to signal when there was an event to process, so that when MQ has an event, it calls into the "Listener" to invoke the message capture. The capture is critical, and once you read these messages, they do not stay around, so persisting these messages, and not exposing any sensitive information is critical. The Process tries to store the event into a SQL Server database, and if there are any exceptions it does a Rollback to MQ, otherwise a successful store of the message should commit the message on MQ so that we do not receive the message a second time; with Multiple tasks listening - this service is effectively doing parallel processing within the context of a task.

The other two services had been single threaded to process the notifications, first to reformat the event data for creation of database lookup keys, and then a service that build new messages to publish on an internal MQ broker to allow the business processes to receive the notifications of the Add/Update/Delete events as they need. That process of being single threaded was fine for over a year, and was called into question on its ability to serve one of the lines of business when there is a massive set of events.

Over the weekend there were over 60K events that hit and when I got called in to look at the issue, there was a backlog of 37K events - the 27K or so events that had processed had been very slow in the eyes of the business starting to use this system. So - I was tasked to provide a throughput that would need to surpass the publishing source so that these processes were not a bottleneck. 

I found that there was an exception occurring within the events, which are XML bodies, and that a particular node was often having an Apostrophe which would cause an exception, which was being caught and retried successfully - but an Exception was throwing the parsing logic out of its process loop midstream; it would then have to reload a batch that included the unfinished set of data (reading the same data twice plus the exception handling performance hit). So - I first added code to detect the type of issue earlier than the database inserts and allow the XML to be have the right codes to avoid the exception and streamline the batch of work that was being done.

Then I saw that the logic was using an index into an array of the batch of data -to populate the insert statements being built.  I considered the idea of SQLBulkCopy -but if there were exceptions in that it would complicate the code more - and I was being given a short time to do all the work. So - I looked at taking the loop - that was building a SQL Connection and command and added in a Parallel For Loop; which on first try left the connection outside the loop but that was causing connection close exceptions, and I had to move the connection inside the parallel for loop - which is a lambda delegate that can be done in parallel within the .Net framework.

The production processing time frame had been estimated and timed for the old process. One of the errors that was occurring during the production run with the new line of business was they had a field that was exceeding the size of the table field we parsed the data into - and we had to restructure the database to allow for their new length on this field (3 hours to do that). The processing of the remaining records was in terms of hours with the single threading. The existing code had timing logic build for the Debug version and setting the timing to 100 records per batch I got the single threaded time. I tested again after changing to the parallel For loop and my measurements showed up to an 86% improvement in throughput.  I took the 37K records in a restored version of the PROD database from the restructure and processed the records in under 10 minutes.

The vendors highest publishing per hour was about 17K and based on the 37K that I processed - the new throughput for the parsing should be over 200K per hour so the parsing was not going to be a bottleneck.

The Notification program existing code to publish to MQ was using a foreach loop - and I was able to modify that to use a Parallel ForEach - as well as using the CancellationToken for the service to pass into the options for the parallel Foreach so that the process would stop when the service stopped.  The throughput was measure in single threaded and then in parallel and showed a 76% improvement in throughput.

I have Console versions of the Windows Services for just this kind of testing - and using a Console.ReadKey to set the cancellation token that would be used in the service - at the Task level and now re-used in the Parallel Options for the Foreach.

It was satisfying to use the Knowledge that I had gathered in the last couple of years on Concurrent programming and apply it to the programs here - and solve the business concern of the processing speed.

Task Parallel Library - was introduced in Visual Studio 2010 with .Net 4. and has been improved over the following versions. Parallel For and ForEach are a little different in their application use, but the results are very satisfying.

Cross Apply XPath and XML

In my previous post, I mentioned that we had an issue with an XML column within a table from a vendor we use.

From the vendors point of view - each customer would probably need different data that they would want as metadata stored for a document in their database, and so using an XML structure probably made sense.

In the query issue we had there was a "cross apply" used for the XML, and it left an assumption that any reader just knew what that meant and why a "Cross Apply" was even used here.

A number of years ago - I was attending a software conference and someone told me that they knew some people who were going to put up a new website for question and answers for technical questions, that was to be called Stackoverflow; I was an early adopter and have mentioned it to anyone learning software development. For this subject of "cross apply" there is a good answer of "why cross apply is needed when using xpath queries".  The  .nodes() function returns a rowset - which the FROM clause needs - and the XPATH can shorten the amount of XPath type names needed in the select.

It is that additional rowset query part - that made the vendor query seem like a table scan within a table scan and was performing so badly.

The previous post calls out the user defined function to extract out a single node into a computed column; and then putting an index on that column. Taking the query time from 4.5 minutes to subsecond response time. As the computed column eliminated the need for any cross apply - and a subquery to a an XML column's rowset.

SQL Server XML Column Performance

Recently at work, we had a stored procedure that was timing out. The stored procedure involved used joins across three vendor tables, and a "cross apply" into an XML column in two of the tables.

There is a line of thinking that I have often encountered, while working within an enterprise, that you normally would not consider modifying a vendors table structure, but the performance of using a "cross apply" for specific XML nodes within the XML Schema - in that XML Column was taking over 4.5 minutes to return a result - with only 845K rows in the main table.

One of my co-workers suggested using temporary tables within the stored procedure - which is actually a better way to go that what was there; which came originally from the vendor. The stored procedure mechanism extracted records from the main table that contained the specific XML nodes being looked for - and then joined the temp tables together reducing the overall table scan and then the XML nodes. This did result in a query that dropped the time down to about 1.4 minutes; still not really all that acceptable.

Response times over 30 seconds hit the default SQL Server timeout - and without a better option, we would have to add a timeout parameter into the command to retrieve the records. It is not a great user experience to have to wait more than 3 seconds. 

To review the details of what was eating up the time - beyond the execution plan, i got curious about performance of table variables vs temp tables - and verified that temp tables were the better option in a stored proc.

One article I came across  to measure the performance used

SET STATISTICS profile ON

SET STATISTICS time  ON

XXX <some Query being measured>

SET STATISTICS time OFF

SET STATISTICS profile OFF

which surfaced very detailed aspects of what was going in using the Cross Apply for the XML - it was like a table scan within a table scan - as it has to scan through all the nodes of the XML within each row.

That led me to research for performance tips on an XML Column.

That has the idea of writing a User Defined function (UDF) that surfaces the value of the XML Node you are after as a Computed Column value - and then putting an Index on that Computed column.

I wrote out the needed function and applied the index in our development system.  I altered the query structure from the stored proc that had been taking 4.5 minutes to use this computed column and now I got subsecond response time.  Without measuring the millisecond values - it was roughly 275 time faster. We need to see if it has any impacts on the database in other ways - but so far this was our best solution.

Only because the computed column is relying on values in the row itself that this can make sense - I do not think a computed column over lots of rows would perform well; this specific case though did.



Updating Blog Offline

I had used my webhosts' software to deploy  BlogEngine sofware on this site, when I gave up on trying to figure out the DasBlog security features for their .Net 4.0 version;  I needed something simpler.

I had issues on what deployed initially on my webhost, and worked out some parts of a fix for the software when my webhost suggested moving to a newer server. By that time a new version of the BlogEngine was available from my webHost - Version 3.1.1. There were still odd things from the BlogEnginge install, that needed to be cleaned up. So - I decided to download the folder that had the BlogEngine software - and found that I could run it locally with Visual Studio doing an "Open website" and it could run with the same userids that were done in the setup. This gave me the opportunity to try to fix it locally.

I had a starting point and made the folder a Git Repository and did an initial commit - so that I could get back to the starting point if I messed up.

I was able to fix a few things trying out some other Themes, and even doing a local upgrade to 3.3.6 and then uploading the result back to my webhost.

With Git and Visual Studio - I could make the changes and verify that the site worked and commit to moving the software forward.

There are still some odd parts to using BlogEngine - but I will see if I can enhance it to work better for me. (When making Blog entries the box where you need to type is the same color as the background and no border to indicate it is there-until you click and start typing. It is controlled by a software project called "summernote" and is using an older version from what I can see.)

From the GitHub site - the software page goes now to a BlogEngine Hosting site - if you do not want to own your domain for blogging. I also found a post that wondered if the code base is "dead' - and they argue that the number of contributions has dropped off.  We will have to see what we can fix - as they say there were a number of bugs in the code.



Making it Stick

In the last quarter of 2018, I came across a post about learning things better. Some of the points made were different - be sure you are well hydrated and have enough sleep. The post also mentioned a book called "Make it Stick"  which I purchased.

I am still in the process of reading the book, but one of the key points so far is the idea of making sure you recall the information repeatedly. The book gives multiple examples of how testing on knowledge repeatedly shortly after learning, make the information retention much more successful. Pop-quiz is actually helping to ingrain the information for better recall.

Taking classes in anything, and then not using it - create a use-it-or lose it situation. Learning something, as you are about to use that knowledge makes that much more permanent. So doing the exercises from a chapter in a book- helps retain the info - even better is making your own exercises for the information.

I have recited the phrase "you have to teach in order to learn" - educators refer to this is "Learn by teaching" where the students learn material they must teach others with. This causes you to refine the information you have learned, putting into your own words and being able to express your understanding to others - independent of the material.

Learn by doing - is what exercises are for - it forces that retrieval of information - whether it is technical in nature, a musical instrument, or in cooking. You learn it better when you retrieve that information and apply it; then it penetrates more deeply.

I have read in some of my chess books, you have to play a lot of chess - where you apply what you have learned, and see how it works out.  One story from the 18th century had a person learn the basic moves, and in the course of the day, learned enough that they were able to win a game at the end - they were so enthusiastic they went and read chess material and playing out the moves on a board - doing almost nothing else (not eating or drinking) - and when they returned to play against another person they lost repeatedly. They had read a great deal and had a lot of theory - but they needed to actually apply that information repeatedly so that the patterns of how things worked penetrated; they needed to be able to use that knowledge independent of the material.


C# Language Features - after the fact

As a developer working with a team, there can often be newer language features in the language, that people on the team, just do not know about. The greater part of our  teams code base was written in VS2008 and VS2010, so that would be .Net 3.5 and 4.0.

A while ago, no one had started to use the Task class. I found myself and introducing that class into our code, learning more about how it should be used, and teaching others how these features work. Adding business value will have a priority in a company, before technical debt issues are addressed. The need for performance in response time brought the use of Task to the forefront; introducing it helped both add business value, and moved the code forward from .Net 3.5 to .Net 4.0.

Software development is a career that requires constant learning and experimentation - which is why having a variety of information sources is very helpful. Today I was going through my unread emails and a CodeProject article of what had changed during the  last 4 years since.Net was open sourced. This is the kind of article I can go back to for adding more of the language features to my tool box.

The starting link from there goes into the   C# 7.2 feature of Span<T> and has a link to the MSDN article of Jan 2018 from Stephen Toub who is one of the key people I have read from Microsoft in showing how things work at a very technical level. 

That got me to look a bit more and I came up with Mark Zhous' 10 part series on C#7 - which I need to go through - and start seeing if there are things that will make us more productive.

Learning after the fact - the new features that come out seems to be the norm. VS2019 just went into preview - and that will have C# 8.0.  

Playing Chess-Chess Engines

I started to learn chess when I was about 12, and stopped by the time I was 14. I was playing a game of chess with my younger brother, when I took his queen off the board, he cleared the remaining pieces with one swoop of his arm and that was the end of playing chess between us. It would  be nice after these decades to see if I could get him to try the game again.

I have tried to play at various points, and with computer chess I found I could always start a game; I rarely ever get past the opening - much less win a game. So I have wanted to learn how to play better - and I have dozens of books - but only one of them have I actually read cover to cover - and that took months.

I discovered the free software Arena, a number of years ago, and that comes with a few "Chess Engines". From their site I found that there are other free chess engines, and a few commercial ones. Once you have installed Arena (i am on windows) you just have to locate the exe for the engine and drag-and-drop the exe on the GUI and it will install the engine; I tend to drop the zip file into the engine folder of Arena and then install it from there. 

Some of the engines I recently installed 

Fire 7.1

Stockfish 10

Komodo 9

Andscacs (july 2018)

There are a number of sites that spend time getting these engines to compete, ChessOwl is one, and ComputerChess is another that I came across. If the number that these sites show for ratings are accurate - I might not win a game for a very long time, and with a lot of work and losses to learn from. 

I did run a short Tournament in Arena to find the weakest of the Engines (AnMon 5.75, then SOS 5.1 a bit higher) - so that I give myself a better chance of winning to start.

I did one game of Engine against engine Stockfish 10 against the Andscacs - where the Andscacs engine beat Stockfish in 91 moves and predicted checkmate 17 moves ahead. (the Andscacs site is in Spanish - the engine seems first rate though).

Blog Restart - 2018

I worked with my hosting provider for a good part of the afternoon - moving to a new hosting server within their data centers and abandoning the DasBlog content that I had created for the last ten years. I selected BlogEngine.Net and the hosting technicans had to adjust the installation to get the software running. Permissions, directory setups; a few things had to be touched and moved to get it set.

"Write to learn" is why I have a blog.

So with this restart of my blog, I hope to provide a value to readers, and organize my own thinking on the many topics that cross my path throughout the year. It really becomes a resource for me to go back to and see my notes on the topics.