Libor.Log

Friday, August 04, 2006

Moving blog to WordPress

I have decided to move my blog pages to LSBblog.WordPress.com side. The main reason for this change is much better entries manageability, higher information value (i.e. calendar of posts) and wider range of options (i.e. I can set for each posts for example categorization, abstract, etc.). I hope you will like this improvement as I do.

Monday, April 03, 2006

Use of Dynamic SQL vs. Stored Procedures in projects

Barry Gervin’s blog entry is picking up almost Hamlet’s question. The story topic is whether stored procedures or generated SQL is the one to use in application to access DB engine.

I guess as almost all cases in software this depends on circumstances as well. In our company we have long experience with both approaches and stored procedure is preferred winner although dynamic SQL is used extensively as well.

General rule we are following can be summed up to following:
1. Use stored procedure on DB which belongs and is fully controlled by our program
2. Use dynamic SQL in case we are accessing 3rd party storage

Why stored procedures?
The main reason why to select stored procedures for our DB is to think about them as about interface methods of data storage object. In this case can/must application programmer use just offered stored procedures (SP) and does not need to care about details in database implementation. DB programmer than apply whatever SQL “tricks” to make data stored efficiently and effectively. This DB tweaking is mostly useful when data are retrieved from DB using complex selection criteria “paging” and sorting with extensive use of temporary tables.

We have got large benefit from stored procedure even when have moved implementation from “standard” SQL access to hot-hot in-house failover solution. In this case additional support on DB layer was completely hidden from callers SP interface.

Management of production system is also one of the key factors influencing our decision what to use. We have find out with use of SP we are much better off when critical bug on data/DB layer is discovered in production environment and must be addressed within minutes/hours. I think this case is not quite usual requirement for “standard” application but in our case this approach saved our back several times already (i.e. because our trading system is dependent on third party is not really possible get definition and test all data combinations our application can receive from 3rd party system/exchanges. Problem is that once we get such “unusual” data we are usually forced to use/handle them as they are critical to client business. This is major reason we need to get almost instant intervention even on production system). With dynamically generated SQL, which is inherently part of the binary, this quick & dirty “hacking” isn’t simply possible to do in such short time.

Additionally with use of stored procedures we can get more system safety and stability when is done regular upgrades for different client with different version. In this case can multiple clients with different binary versions use same version of DB. Software with older version of binaries can safely run with updated DB as SP’s have default newly introduced parameters appended always at the end and set with some reasonable default value(s).

Last but not least point why we are using stored procedures is unit/integration testing. This point was not even mentioned in any of blog entries. With stored procedures we can do much more cleaner and fine grained unit testing and code review as specific details for each execution environment are cleanly separated.

When dynamic SQL is used?
On other side if we are not controlling database engine where we are storing/retrieving data we are using dynamic SQL. In this case we don’t need to do any modification external system which is always welcomed by our clients. Unfortunately we must know their DB schema in quite detail and relay on their capability to inform us when modification is done.

My Recommendations
If you have available DB developer(s) and you are expecting to develop mid or large size application which uses its own DB than preferred choice is use SP. I believe with this selection you will gain big advantage in long term.

If used DB is external to your application or application is really small in size than dynamic SQL can give you simpler code management. Unfortunately price you will pay is less clear source code as you are mixing SQL and programming language of your choice.

Thursday, March 30, 2006

Rotor ver. 2 out of doors

Microsoft released this week new version Rotor 2. New features included are:

Full support for Generics.
New C# 2.0 features like Anonymous Methods, Anonymous Delegates and Generics
BCL additions.
Lightweight Code Generation (LCG).
Stub-based dispatch.
Numerous bug fixes.

This is package is especially useful when one wants to know what source code is behind of .NET in detail. Previous version saves me a lot of time when I wanted to look, how some parts are done and why it does not work in my program.

Monday, March 06, 2006

.NET Application Memory and Performence Tips

I have spent lately some time to investigate possibilities to reduce memory footprint and increase processing speed of our trading application.

This is especially key factor for us due offered fully managed client service where server side components as well as presentation F/E application are only running inside dedicated datacenter.

Good side of datacenter hosted service is the users are connected via web from virtually any place around the world. This gives us opportunity to offer web based access for standalone and rich applications while deliver high performance and not need to worry about installation and compatibility issues of OS environment on client side.

Unfortunately with this approach, where all applications are running in datacenter, we have got another aspect to the game. Our biggest problem with .NET application is memory usage on several applications running in parallel on single machine.

If you have GUI rich application than as minimum you will start with 20-30MB of RAM. In our case where used might want to see in one grid all its/foreign data we can collect from server side services as much as 200-300MB of data per user. With 4 GB on HW box we can end-up with only 10 sessions per server. Obviously so low number of sessions can therefore significantly add to cost for our service offering.

When I did a survey to get more detail information about memory and performance options on .NET applications I compile following list of the most interesting tips one can use. I hope this list might be useful to someone else.

.NET has 3 GC modes:

Workstation GC with concurrency ON (default for all but ASP.NET)
Workstation GC with concurrency OFF
Server GC (default for ASP.NET)

Workstation GC with concurrency is usually the best option for F/E applications where is required app. responsiveness while is GC executed
Server CG is using dedicated GC thread per CPU therefore number of machine processors determine how many GC threads are created. As server application typically runs only one on dedicated machine number of GC threads is not usually problem.
Server GC is accessible only via hosting APIs on 1.1 NET (look at CorBindToRuntimeEx). In 2.0 NET this shall work over config file.
Default on W2k3 is Workstation GC
Server GC collects faster on MP machine due to multiple GC threads working together compare to Workstation GC when you are collecting the same size of heap
Good health indication for GC gen0 & gen1 generations is 10:1 (i.e. 1000 on gen0 and 100 on gen1, etc.)
Large objects are going straight to gen2 if size is >= 85kB
A good practice with large objects is to allocate one and keep reusing it so you don’t incur more full GCs
Use pinning wisely and only on shortest possible time. If use pinning than do that on large object heap as there aren’t object reallocated or pin buffers in Gen2 objects as there are not much moved/compacted objects
To look what is on GC heap and what is holding it there for application (i.e. what is occupation of each GC gen) is available via vadump & CLR profiler. Use them whenever you can for performance application tuning.
Microsoft internal testing showed with many ASP.NET server processes running on one machine the Workstation GC (no Concurrent GC) proved to be a good choice compare to Server GC (reason is the user thread that triggered the GC and doesn’t create addition threads)
Don’t make many small DLL’s. It takes more time to load them & verify them. VM also always consumes 64k of memory even the DLL has only 10k. Many such DLL’s therefore might contribute to large memory footprint and even be a source of OOM (Out of Memory) exception thrown by VM.
GC becomes more aggressive if HW box gets low on physical memory. If GC becomes more active than application becomes less responsive as more GC is happening.
If application is creating a lot of temporary large objects the app. is probably in bad situation. Large objects are getting on gen2 and than GC clear them. Unfortunately gen2 is the most expensive for GC this to do.
Is not recommended to run GC.Collect explicitly from application. If application does that than GC timing is damaged and this farther makes application underperforming.
If fragmentation on gen2 level 20% is considered very good.
What to do when analyzing common CLR performance problems is written at http://blogs.msdn.com/akhune/archive/2004/06/11/153734.aspx

Next useful reading:

I would be interested to know if someone must face same problems with application running in limited resources to address memory/performance issues and what is the recomended way to approach it.

Wednesday, January 18, 2006

Will SSL win over IPsec in use?

We are lucky to have chance to run our electronic trading application in “standardised” data-centre environment where HW is build exactly on requirements imposed by business and application (i.e. high performance for P2P and publish/subscribe, support of hot-hot fail-over from top/down even network layer, etc.).

In case of communicating with external sources we are heavily relaying on use of SSL as less complex and with smaller processing overhead compare to IPSec. On data-centre side is executed SSL on HW level as we are using Juniper router with build in support of SSL.

Yesterday I have come across interesting article where Gartner forecasts IPSec is dead for wide spread use. I’m glad we were bet on right “horse” when we start to build our application and pursued network guys to take SSL preference compare to IPSec.

Important role of nonfunctional requirements for developing distributed system

Is quite interesting how some problems are accumulating on one place at the same time to bring quite big attentions to software capabilities and trigger in client’s mind question if their ISV supplier can suffer same problems.

TSE (Tokyo Stock Exchange) becomes quite popular these days in financial news for reporting problems with they execution engine.

Yesterday TSE made announcement regarding problem with capability of handling traded order volumes where they stated they can’t handle more than 4 millions a day. They thinking they will “temporary” address this problem by shortening trading hours which seems to me as false assumptions as traders & their clients tends to squiz same order volume to smaller time frame if they know beforehand this is limitation.

In last mid December there was another incident on TSE. In this case Mizuho Securities Co. trader was allowed to mistakenly placed a sell order for 610,000 shares of J-COM at 1 yen, instead of the intended 1 share at 610,000 yen. In the financial industry was even this mistake named as “fat finger” error.

Those errors are clear evidence of TSE trading engine is not robust and scalable enough. Robustness and scalability as nonfunctional requirement were heavily underestimated or no review was executed to validate if constrains of use in production are still same.

During my professional career I experienced same problems (i.e. lack of validation of non-functional requirements during lifetime product) as now people facing in the TSE. My recommendation to it is therefore be pro active and rather prepare software for higher requirements than rigidly keep to stated constrains. Unfortunately make efficient extrapolation from “standard” requirements for distributed system is particularly problematic and requires a large knowledge in technology as well as in business processes.

BTW: I’m wondering how Japan Securities Clearing Corporation can come up with settlement price different than maximum stock price traded especially in case stock is traded during IPO as happened in case of J-COM.

Friday, January 06, 2006

Removing duplicate lines in SciTe

In my work I’m quit often using ScitTE editor as better alternative to Notepad. I especially like its compacted version which takes only less than 0.5MB on disc (unbelievable binary size these days!) including support for varies programming languages out of box (XML, HTML, C#, CPP, Java, SQL and Batch among the most important for me).

What I have long time missed in program is sorting and more importantly removing duplicates text lines. I know there is possibility to extend it with Lua scripts but I was hesitating to venture in new territory.

Finally I have got enough courage and step into “cold water” and wrote several functions in Lua because I could not find any help via Google or any other web common search source.

Required SciTE modification steps are as follows:

Load Lua interpreter and put it into SciTE installation directory

Add to SciTEUser.properties file following section (you might put it to Global/Local option file if you wish so)

ext.lua.startup.script=
$(SciteUserHome)/LuaStartUp.lua

command.name.1.*=Load Lua
command.1.*=dofile $(FilePath)
command.subsystem.1.*=3

command.name.2.*=Sort Text
command.2.*=sort_text
command.subsystem.2.*=3
command.mode.2.*=savebefore:no

command.name.3.*=Sort and Remove Duplicates
command.3.*=remove_duplicate_text
command.subsystem.3.*=3
command.mode.3.*=savebefore:no

Create file LuaStartUp.lua on place where is pointing variable ext.lua.startup.script as defined in previous step. Content of file is as follows:

function sort_text()
local buf={}
local out=""
buf = sort_text_base(editor:GetSelText())
out = table.concat(buf,"\n")
editor:ReplaceSel(out.."\n")
end

function remove_duplicate_text()
local buf={}
local out = editor:GetSelText()
buf = remove_duplicate_base(
sort_text_base(out))
out = table.concat(buf,"\n")
editor:ReplaceSel(out.."\n")
end

-- Helper functions
function lines(str)
local t = {n = 0}
local function helper(line)
table.insert(t, line) end
helper((string.gsub(str,
"(.-)\r?\n", helper)))
-- parser somehow adds extra item to table.
-- Following line compensate for it.
if t[table.getn(t)] == nil or
t[table.getn(t)] == "\n" or
t[table.getn(t)] == ""
then table.remove(t,table.getn(t)) end
return t
end

function remove_duplicate_base(sort_buf)
local buf={}
local line=sort_buf[1]
local counter = 2
table.insert(buf,line)
while table.getn(sort_buf)
>= counter
do
if sort_buf[counter] ~= line
then
line = sort_buf[counter]
table.insert(buf,line);
end
counter = counter + 1
end
return buf
end

function sort_text_base(sel)
local buf={}
local function remove_empty(pos,item)
if item == nil then buf.remove(pos) end
end
buf = lines(sel)
-- make sure no null items are in the list
table.foreach(buf,remove_empty)
table.sort(buf)
return buf
end

Restart SciTE. Once done you should see in Tool menu three new items. If you left setup of shortcuts as defined in step 2 than you might start to use it.

Note: this works only on on text selection in editor.

Removal of duplicate lines is based on example of sort text lines in SciTE on Lua-Users web site. Presented code and steps are far form perfection. There is big room to improve it but it works for me. Now it might help others if they need it.

Wednesday, January 04, 2006

Declare table ver. temporary table definition execution speed in MS SQL

I have met recently with interesting problem of execution speed when use temporary table definition in SQL server.

Issue we have had to solve is to address execution speed when was used temporary table inside SP (stored procedure). Particularly this problem was related to getting chunks of data from one of the table sets. Main table contains large amount of data records (~ million+ of rows) with other 4 dependent tables (via “foreign key”) where each “child” table has between 2–8 records per one main table record.

Main table looks like following:

CREATE TABLE [og].[Orders]  (
 [orderid] [bigint] not null ,
 [serviceid] [nvarchar] (128) not null ,
 [companyid] [int] not null constraint
 [df_orders_companyid] default (1),
 ...
 ...
 ...
 [transportid] [bigint] null,
 constraint [pk_og_orders] primary key clustered
 (
  [orderid],
  [serviceid] ) with fillfactor ="" 90 on [primary]
 ) on [primary]

Chunks are selected in “one by one” with following SP excerpt:


...
CREATE TABLE #Orders
  -- comment out when running C# class generator
-- DECLARE @Orders TABLE
 -- uncomment when running C# class generator
  (
    OrderId BIGINT PRIMARY KEY CLUSTERED
  )
SET ROWCOUNT @PerPage
INSERT INTO #Orders
 -- comment out when running C# class generator
-- INSERT INTO @Orders
 -- uncomment when running C# class generator
SELECT
 a.OrderIdFROM [og].[Orders] a (NOLOCK)
WHERE a.ServiceId  = @ServiceId  AND
 a.OrderId > @StartOrderId
   ...
ORDER BY OrderIdOPTION (KEEP PLAN)
SET ROWCOUNT 0
SELECT  a.OrderId,  a.ServiceId,
  a.CompanyId,  ...  a.IsTradePriceBetter
FROM [og].[Orders] a (NOLOCK)
  INNER JOIN #Orders b ON (a.OrderId = b.OrderId)
 -- comment out when running C# class generator
--  INNER JOIN @Orders b ON (a.OrderId = b.OrderId)
 -- uncomment out running C# class generator
WHERE a.ServiceId = @ServiceIdORDER BY  a.OrderId
OPTION (KEEP PLAN)
...
...

MS SQL docs states following sentence “In general, table variables contribute to more efficient query processing.”

When we have run SP on real data (~1 million of records where average ServiceId has around 20-30 thousands records and one service has around 0.5 million) with alternative temporary table definition (see commented sections in code excerpt above) we have got quite interesting results (All data were gained on SQL 2000 installed in default configuration on “dual Xeon” machine with 1.5 GB of memory and SQL Analyzer).

When we have used declare table statement (i.e. in memory table) than execution time for getting arbitrary 1000 items on 0.5 million set around 7 seconds. When we have switched to temporary table definition (i.e. table is created in tempdb) we have got 0.5 second!

We have had to opt for temporary table use to addess execution speed even this approach rises issues with setup of string collation between business DB and tempdb where is created temporary table..

No one from DB experts around was able to give us reasonable explanation what is going on inside SQL server. I wander if someone else has same experience with it.

Additionally we have met with problem of getting setup data from SP when were used temporary table definitions. Our in-house tool tryed to read DB structure and SP setup over OLE DB provider (we have selected OLE provider because only this one has some documentation and access rights to be able to supply required structural and SP data). Net result of program run on SP’s with temporary table inside is thrown non-descriptiove OLE DB exception.When we have used declare table statement than tool was pretty save and all structural/setup data were collected successfully?!?

Programming and cooperation with SQL serve is not really easy task when one needs to get more than just basic support/functionality out of it.

Name: Libor SOUCEK

Location: Prague, CZ

Libor is Chief Software Architect at FFastFill plc. Before FFastFill, Libor worked for four years as chief software developer at Ford Research Laboratory in Dearborn (USA). Project in Ford involved development of rule based system for automated STP (Straight Through Processing) of design, analyze and optimization power-train mechanical parts running on wide range of UNIX systems. Next two years Libor spend as project manager at Schemantix where he worked on innovative B2B market places engine based on JAVA and XML . Part of the solution was integration with SAP systems via XML data exchange. In 2002 Libor becomes a Chief Software Architect at FFastFill. While at FFastFill he architected wholly new version of SOA based electronic trading system. Trading system unique architecture is fully based on .NET framework (i.e. C#) and independent of underlying network layer. Product is in production in several major brokerage banks and is tuned for near future to be also used as prime order management system on one of the largest Exchanges in the USA. Libor received an M.Sc. in Mechanical Engineering degree for Numeric Simulationof Combustion Engines from Czech Technical University in 1993.

View my complete profile