December 2011 - Auto-Magical

I want to give a brief tutorial on understanding CPU processor utilization since it is commonly an area of confusion during performance analysis.

CPU time

CPU time is the percentage of clock ticks than a processor spends waiting for instructions. This is opposed to wall-clock time, which is the total time the whole computer takes to perform an operation. The “load” on a system is the ratio of clock ticks which are performing operations versus the click ticks spent waiting for instructions over a given time period. Thus, load only makes sense as an average over a particular time period. During any single clock tick, the CPU is either processing an instruction or a HLT.

Idle time

The CPU processes a set of instructions fed from memory. The memory contains a set of opcodes (commands) which tell the CPU which operation to perform and the memory where it is located. The rate of clock ticks is constant – several gigahertz in a modern CPU – and during each clock tick, the CPU processes either an instruction to do some work, or a HLT – an opcode which tells the CPU to turn keep components idle until the next cycle.

Monitoring CPU time

So if we were to “observe” a CPU at the clock-tick level of detail, we would see that it’s always either working or waiting. But we can’t actually do that, since the more closely we monitor clock cycles, the more clock cycles are needed to do the monitoring. So to get an accurate picture of activity, we have to step back to a level where the monitoring tool does not interfere too much with the process being observed.

Input/output overhead

Any given computer task has several components: CPU instructions, memory IO, disk IO, network IO, and many others. Only extremely simple tasks (like calculating π) can fit in the small (but very fast) memory buffers on the CPU itself. Real-world tasks will almost always require the CPU to wait for other components to finish shuffling data back and forth in a state where it is ready for the CPU to work on it. But the ideal scenario is for the CPU to be kept as busy as possible (constant 100% load) until the task is completed. This is the minimum possible time in which a task can be completed. If we were to monitor CPU activity during such a task, we’d see the load jump to 100% during the task than back to the baseline when it’s done. But if the task is shorter than the measurement frequency of the CPU monitoring tool, it would be an average over two periods, or it might not be detected at all.

Multitasking

The situation is more complicated when there are multiple tasks to run, each off which requires some fraction of CPU time. The operating system will run the tasks concurrently. Both the processing time and the IO of tasks will overlap, but because the operating system takes turns running each task, the total CPU load will be some combination which may be higher or lower than the total of the individual tasks – depending on the other competing resources the tasks use. For example, two disk-intensive tasks will use less CPU than the sum of both running individually because the disk IO will be the bottleneck. But two CPU-intensive tasks would more than double total load because the CPU will have to run both tasks and have to handle the context switching between concurrent tasks.

Further reading

An ORM provides value by doing a lot of things for you – virtualizing databases as native objects and converting types automatically. But the work the ORM does to reduce developer work has a cost too – it has an inherent performance penalty and may encourage some bad development practices.

An ORM by definition has to do some work to convert a database schema into a native view. ORM’s can try to minimize this performance penalty by defining the transformation at design time or caching the database to native object transformation at different points.

Entity Framework is built on top of ADO.Net, which is just an API that does not “know” anything about the database. To convert .Net code into SQL queries and back into CLR objects, Entity Framework performs a set of operations at different stages: (1) compile time (2) first run (3) each execution. To improve performance, EF allows you to shift some work from step 3 to 1 or 2. But the details can be tricky and understanding the best optimization strategy requires understanding the EF query execution pipeline.

The EF execution pipeline

The following information is based on this MSDN EF Performance page.

There are six steps I want comment on in EF execution:

loading metadata
generating views
preparing the query
executing the query
tracking
materializing objects

1: Loading metadata: The metadata is the mapping defined in your EDMX file. This is a very expensive operation (see the breakdown here), but it only happens once. EF applications use more memory and have a warm-up penalty which should be ignored in performance analysis if you are not concerned with startup times. Models with with very large (200+) number of entities have a number of problems – see this and this.

2: Generating views: The local query views are static objects which are cached per application domain. This is also an expensive operation but it can be pre-generated and embedded in the application if you care about first run times.

3: Preparing the query: Each unique query must be compiled into the EF version of a stored procedure before it is executed. Microsoft says that the commands are cached for later executions, but I’m confused about what exactly is cached because profiling shows that the query is compiled every. In any case, Microsoft suggest caching the compiled query to avoid this penalty.

Caching precompiled queries is somewhat unwieldy, so it is only advisable in performance-critical contexts, but it makes a big difference for frequently executed queries.

(Note: Take care when using .Count() or Any() to avoid unnecessary query recompilation and avoid unnecessary enumeration when a simple boolean check will do.)

By the way, all the steps above can be skipped by using direct SQL queries or stored procedures with EF. The performance of direct entity SQL falls between pure ADO.Net and Entity Framework queries, so it is only useful as the occasional exception to LINQ queries against a data store.

4: Executing the query: Query execution time depends on the underlying data source. Entity Framework is just a library built on top of ADO.Net, so pure ADO.Net queries are the a benchmark for any ORM’s built on top of it. Because, ADO.Net will always be faster than any framework built on top of it, it can be used as a fallback when other options have been exhausted.

5: Tracking: tracking is used to track changes for updates. If you only need to read data, you can get a small performance boost by disabling it.

6: Materializing objects: each object returned from the database must be converted into a class instance to be used. There is no way to avoid this penalty, but it is worthwhile to keep in mind that the less data there is, the faster it will materialized – not to mention transferred over the wire.

For best performance, queries should be as specific as possible. I have noticed that ORM’s encourage the bad habit of always selecting an entire row. This is because developers using an ORM tend to think of the database as an object repository rather than as a relational store, whereas raw SQL queries encourage manually selecting the needed rows. (Unless one has the awful habit of “select *”.) So to minimize overhead, pull just the data you need (the MVVM pattern is helpful in this regard.)

Final thoughts:

As the ADO.NET program manager himself has pointed out, Entity Framework is inherently slower than ADO.Net SQL queries. But performance should always be balanced against productivity. There are some applications which are definitely not suitable for an ORM and many others that are. Some operations can only be done using raw SQL and can take forever using an ORM (“truncate table”, temp tables, etc). I think the best approach is to use an ORM where appropriate and optimize in the specific scenarios where performance is inadequate. I do think that Entity Framework is the best, simplest, and the safest choice out of all .Net ORM’s.

These are just a few tips on Entity Framework performance. There are other tips and much more information in the pages below and the links therein.

More reading:

Auto-Magical

Monthly Archives: December 2011

Understanding processing time

Performance considerations for the Entity Framework execution pipeline

n. 1: automatic, but with an element of magic. 2: too complex to understand and/or explain