Code Answer: 04/14/11

Thursday, April 14, 2011

What component do I need to monitor my internet traffic on my PC?

I would like to be able to see and monitor my internet data (http/emule/email) on my own PC using Windows XP. I am thinking of something like WireShark but I would like to control it programmatically.

I would be using C or C++.

How can I do this?

From stackoverflow

WireShark uses winpcap to do it's thing.

Winpcap comes with a C interface.

: is it the only method ?
winpcap is probably the most well known choice, but you could also write a Layered Service Provider. There's not a whole lot of documentation, but a good place to start is the article on msdn: http://www.microsoft.com/msj/0599/LayeredService/LayeredService.aspx

This has some advantages vs layer 2 packet parsing, but also some disadvantages. You'll need to evaluate where in the stack you want to live.

edit: Obviously, when I say pcap -- I really mean any similar approach. Obviously, winpcap was not the first driver/library combo to provide this sort of information.
Have a look at the code of the tool trafficWatcher. It uses WinPCap to distinguish between LAN and internet traffic.

The sourcecode is available here.

What is the correct way of reading from a TCP socket in C/C++?

Here's my code:

// Not all headers are relevant to the code snippet.
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <cstdlib>
#include <cstring>
#include <unistd.h>

char *buffer;
stringstream readStream;
bool readData = true;

while (readData)
{
 cout << "Receiving chunk... ";

 // Read a bit at a time, eventually "end" string will be received.
 bzero(buffer, BUFFER_SIZE);
 int readResult = read(socketFileDescriptor, buffer, BUFFER_SIZE);
 if (readResult < 0)
 {
  THROW_VIMRID_EX("Could not read from socket.");
 }

 // Concatenate the received data to the existing data.
 readStream << buffer;

 // Continue reading while end is not found.
 readData = readStream.str().find("end;") == string::npos;

 cout << "Done (length: " << readStream.str().length() << ")" << endl;
}

It's a little bit of C and C++ as you can tell. The BUFFER_SIZE is 256 - should I just increase the size? If so, what to? Does it matter?

I know that if "end" is not received for what ever reason, this will be an endless loop, which is bad - so if you could suggest a better way, please also do so.

From stackoverflow

Where are you allocating memory for your buffer? The line where you invoke bzero invokes undefined behavior since buffer does not point to any valid region of memory.
```
char *buffer = new char[ BUFFER_SIZE ];
// do processing

// don't forget to release
delete[] buffer;
```
Several pointers:

You need to handle a return value of 0, which tells you that the remote host closed the socket.

For nonblocking sockets, you also need to check an error return value (-1) and make sure that errno isn't EINPROGRESS, which is expected.

You definitely need better error handling - you're potentially leaking the buffer pointer to by 'buffer'. Which, I noticed, you don't allocate anywhere in this code snippet.

Someone else made a good point about how your buffer isn't a null terminated C string if your read() fills the entire buffer. That is indeed a problem, and a serious one.

Your buffer size is a bit small, but should work as long as you don't try to read more than 256 bytes, or whatever you allocate for it.

If you're worried about getting into an infinite loop when the remote host sends you a malformed message (a potential denial of service attack) then you should use select() with a timeout on the socket to check for readability, and only read if data is available, and bail out if select() times out.

Something like this might work for you:
```
fd_set read_set;
struct timeval timeout;

timeout.tv_sec = 60; // Time out after a minute
timeout.tv_usec = 0;

FD_ZERO(&read_set);
FD_SET(socketFileDescriptor, &read_set);

int r=select(socketFileDescriptor+1, &read_set, NULL, NULL, &timeout);

if( r<0 ) {
    // Handle the error
}

if( r==0 ) {
    // Timeout - handle that. You could try waiting again, close the socket...
}

if( r>0 ) {
    // The socket is ready for reading - call read() on it.
}
```
Depending on the volume of data you expect to receive, the way you scan the entire message repeatedly for the "end;" token is very inefficient. This is better done with a state machine (the states being 'e'->'n'->'d'->';') so that you only look at each incoming character once.

And seriously, you should consider finding a library to do all this for you. It's not easy getting it right.

Without knowing your full application it is hard to say what the best way to approach the problem is, but a common technique is to use a header which starts with a fixed length field, which denotes the length of the rest of your message.

Assume that your header consist only of a 4 byte integer which denotes the length of the rest of your message. Then simply do the following.

// This assumes buffer is at least x bytes long,
// and that the socket is blocking.
void ReadXBytes(int socket, unsigned int x, void* buffer)
{
    int bytesRead = 0;
    int result;
    while (bytesRead < x)
    {
        result = read(socket, buffer + bytesRead, x - bytesRead)
        if (result < 1 )
        {
            // Throw your error.
        }

        bytesRead += result;
    }
}

Then later in the code

unsigned int length = 0;
char* buffer = 0;
// we assume that sizeof(length) will return 4 here.
ReadXBytes(socketFileDescriptor, sizeof(length), (void*)(&length));
buffer = new char[length];
ReadXBytes(socketFileDescriptor, length, (void*)buffer);

// Then process the data as needed.

delete [] buffer;

This makes a few assumptions:

ints are the same size on the sender and receiver.
Endianess is the same on both the sender and receiver.
You have control of the protocol on both sides
When you send a message you can calculate the length up front.

Since it is common to want to explicitly know the size of the integer you are sending across the network define them in a header file and use them explicitly such as:

// These typedefs will vary across different platforms
// such as linux, win32, OS/X etc, but the idea
// is that a Int8 is always 8 bits, and a UInt32 is always
// 32 bits regardless of the platform you are on.
// These vary from compiler to compiler, so you have to 
// look them up in the compiler documentation.
typedef char Int8;
typedef short int Int16;
typedef int Int32;

typedef unsigned char UInt8;
typedef unsigned short int UInt16;
typedef unsigned int UInt32;

This would change the above to:

UInt32 length = 0;
char* buffer = 0;

ReadXBytes(socketFileDescriptor, sizeof(length), (void*)(&length));
buffer = new char[length];
ReadXBytes(socketFileDescriptor, length, (void*)buffer);

// process

delete [] buffer;

I hope this helps.

grieve : Ori Pessach's comment is a good complimentary answer to this one.

If you actually create the buffer as per dirks suggestion, then:
```
  int readResult = read(socketFileDescriptor, buffer, BUFFER_SIZE);
```
may completely fill the buffer, possibly overwriting the terminating zero character which you depend on when extracting to a stringstream. You need:
```
  int readResult = read(socketFileDescriptor, buffer, BUFFER_SIZE - 1 );
```
This is an article that I always refer to when working with sockets..

THE WORLD OF SELECT()

It will show you how to reliably use 'select()' and contains some other useful links at the bottom for further info on sockets.
1) Others (especially dirkgently) have noted that buffer needs to be allocated some memory space. For smallish values of N (say, N <= 4096), you can also allocate it on the stack:
```
#define BUFFER_SIZE 4096
char buffer[BUFFER_SIZE]
```
This saves you the worry of ensuring that you delete[] the buffer should an exception be thrown.

But remember that stacks are finite in size (so are heaps, but stacks are finiter), so you don't want to put too much there.

2) On a -1 return code, you should not simply return immediately (throwing an exception immediately is even more sketchy.) There are certain normal conditions that you need to handle, if your code is to be anything more than a short homework assignment. For example, EAGAIN may be returned in errno if no data is currently available on a non-blocking socket. Have a look at the man page for read(2).

nbolton : Good point, its not good to leave open socket handles lying around; will consider throwing afterward.

Dan Breslau : Actually, I didn't address the open socket handle, because it's not opened in the snippet that you posted. But I'm glad you thought of it :-)
I have found this guide to be most helpful.

Migrating to WCF from ASP.Net 2.0 web services

I currently have a suite of .Net web services that are developed in ASP.Net 2.0. They don't utilize any WS-E extensions, rather they implement security at the application level. They are fairly straightforward data retrieval/update features. I am interested in re-factoring these web services gradually over time into WCF services, mostly to future-proof them against the deprecation of old framework versions. My main concern is the amount of re-coding that will need to be done in the client applications to re-connect with these services.

Has anyone gone down this road already?
Was it worth it?
Can you recommend any reading materials that I can use to continue my research in this area?

From stackoverflow

Your concerns are valid. I've been taking the approach of 'if it ain't broke, don't fix it' and leaving existing 'legacy' web services alone until they need work. For new development we have been adopting WCF the solution to communication problems big and small. The most tangible benefit is the ease of integration if you have a nice clean object model. The most tangible negative is that (relatively speaking) there are much more configuration headaches, especially when you start trying to secure things.

I find it interesting you mentioned the WS-* spec as that is where we had the most problems integrating WCF with a Java system. The security features are just plain missing right now, and we ended up using Microsoft.Web.Services3 (WSE3) for that one piece of the .NET side of things.

I would highly recommend reading through these articles even if you don't do security with WCF right now, you probably will have to at some point.

John Saunders : Could you elaborate on which security features are missing from WCF? I didn't know it was missing any at all.

slf : Specifically when dealing with UsernameToken on a custom binding we couldn't find a way to set MustUnderstand to false. After a couple of days of troubleshooting WCF we found WSE3 supported this out of the box and used it simply out of respect to the project schedule.
It is quite possible that your clients won't need to change at all.

On the server side, I've recently seen an ASMX service that was turned into a WCF service simply by placing [ServiceContract] on the [WebService] class, and [OperationContract] on the [WebMethod] methods, and by turning the non-primitive types it returned into [DataContract] and [DataMember].

Selecting a good dictionary key

I have an object that I want to use to look up other objects. I will be using a Dictionary<TKey, TValue>().

The key object has two strings that uniquely identify it, say KeyObj.Str1 and KeyObj.Str2.

What do you recommend that I use as the key for the dictionary?

1: The concatenation of the strings.

Dictionary<String, TValue>();
Key = KeyObj.Str1:KeyObj.Str2; ("somestring:anotherstring")

2: A unique integer for each object to identify it?

Dictionary<int, TValue>();
KeyObj.ID = _nextID++;
Key = KeyObj.ID;

3: A reference to the object.

Dictionary<KeyObj, TValue>();
Key = KeyObj;

Option 3 would be the easiest, but it seems like it would be inefficient to index a dictionary based on reference values.

If the key object contained a single unique string, the obvious choice would be use that, but having two strings that are only unique in combination makes it more difficult.

From stackoverflow

You don't need to use a new class as the dictionary key. Using a new struct instead as it will be much more lightweight... And have it consist of those two string values obviously.

Josh G : The class that I am using is more complicated that I described in the example... I simplified it to keep it clear. I don't want to make it a struct.
If performance is a major consideration, you could consider using a hashvalue of the two strings. But then your 'value' field would have to contain both the keys and the value.

I have a reference to another SO question, I just have to find it.

http://stackoverflow.com/questions/658065/is-it-faster-to-search-for-a-large-string-in-a-db-by-its-hashcode/658187

But that question is more DB oriented. And the performance is considered for thousands of iterations.
I would say option 1.
Concatenating them is probably the best idea. You can expose a property in the KeyObj object that does the concatenation so you don't have to perform it each time you're accessing the dictionary value.

Edit:

I apparently misread the question. I think what you really want to do is a mix of 1 and 3, you can override Equals() and GetHashCode() to use the strings that uniquely identify the object (just make sure they are immutable!)
```
public override Equals(object obj) 
{
   if (obj == null || !(obj is KeyObj))
      return false;
   KeyObj other = (KeyObj)obj;
   if (this.Key1 == other.Key1 && this.Key2 == other.Key2)
     return true;
   return false;
}

public override GetHashCode()
{
    return (this.Key1 + this.Key2).GetHashCode();
}
```
Then you can use the 3rd option you suggested:
```
Dictionary<KeyObj, ValueObj>...
```
what about using the KeyObj.GetHashCode()?

Josh G : This sounds promising...

Groo : According to MSDN: The default implementation of the GetHashCode method does not guarantee unique return values for different objects.

Groo : (therefore the actual question here was how to implement it)
Any of them are valid, but I'm assuming you'd want to be able to quickly find these objects based on one of the two strings, so using an int as the key would mean you'd still have to scan the values to find the object you wanted.

Are the strings both unique, or only when combined? If they're both unique, and you're willing to trade a bit of space, you could do:
```
dict.Add(KeyObj.Str1, KeyObj);
dict.Add(KeyObj.Str2, KeyObj);
```
and have two references to the object in the dictionary, using each unique string as a key. Or, you could always just combine the strings if they're only unique together, and it'll use the hashcode internally to look them up.

Josh G : They are only unique when combined.
Concatenated strings should work best.

IF you know that their combination is unique, then that is what you should choose -- remember that Hash code is usually unique, but not always.
Remember that a dictionary is a glorified hash table, so the key (no pun intended) is to use a key that will result in very few (if any) collisions with another key. I'd lean toward #3, but that's assuming the KeyObj type has a good hash value generator.

Groo : I wouldn't say so, because all keys must be unique in a dictionary.

Josh G : Does the Dictionary class implicitly use KeyObj.GetHashCode() for comparing the reference objects?

Groo : It actually uses a default implementation of EqualityComparer (if you don't specify one). It uses the GetHashCode result to speed up the search (by creating several buckets), but at the end it uses the Equals method to make sure that they are identical.

You could use option 3 if you can override GetHashCode() and Equals() appropriately, i.e. something like this:

    public override int GetHashCode()
    {
        return str1.GetHashCode() ^ str2.GetHashCode();
    }

    public override bool Equals(object obj)
    {
        if (!obj is KeyObj)
        {
            return false;
        }

        KeyObj key = (KeyObj)obj;
        return this.str1.Equals(key.str1) && this.str2.Equals(key.str2);
    }

Josh G : This is a good option for making #3 work. Thanks.

Unable to decode unicode string in Python 2.4

This is in python 2.4. Here is my situation. I pull a string from a database, and it contains an umlauted 'o' (\xf6). At this point if I run type(value) it returns str. I then attempt to run .decode('utf-8'), and I get an error ('utf8' codec can't decode bytes in position 1-4).

Really my goal here is just to successfully make type(value) return unicode. I found an earlier question that had some useful information, but the example from the picked answer doesn't seem to run for me. Is there something I am doing wrong here?

Here is some code to reproduce:

Name = 'w\xc3\xb6rner'.decode('utf-8')
file.write('Name: %s - %s\n' %(Name, type(Name)))

I never actually get to the write statement, because it fails on the first statement.

Thank you for your help.

Edit:

I verified that the DB's charset is utf8. So in my code to reproduce I changed '\xf6' to '\xc3\xb6', and the failure still occurs. Is there a difference between 'utf-8' and 'utf8'?

The tip on using codecs to write to a file is handy (I'll definitely use it), but in this scenario I am only writing to a log file for debugging purposes.

From stackoverflow

You need to use "ISO-8859-1":
```
Name = 'w\xf6rner'.decode('iso-8859-1')
file.write('Name: %s - %s\n' %(Name, type(Name)))
```
utf-8 uses 2 bytes for escaping anything outside ascii, but here it's just 1 byte, so iso-8859-1 is probably correct.
Your string is not in UTF8 encoding. If you want to 'decode' string to unicode, your string must be in encoding you specified by parameter. I tried this and it works perfectly:
```
print 'w\xf6rner'.decode('cp1250')
```
EDIT

For writing unicode strings to the file you can use codecs module:
```
import codecs
f = codecs.open("yourfile.txt", "w", "utf8")
f.write( ... )
```
It is handy to specify encoding of the input/output and using 'unicode' string throughout your code without bothering of different encodings.
It's obviously 1-byte encoding. 'ö' in UTF-8 is '\xc3\xb6'.

The encoding might be:
- ISO-8859-1
- ISO-8859-2
- ISO-8859-13
- ISO-8859-15
- Win-1250
- Win-1252
So in my code to reproduce I changed '\xf6' to '\xc3\xb6', and the failure still occurs

Not in the first line it doesn't:
```
>>> 'w\xc3\xb6rner'.decode('utf-8')
u'w\xf6rner'
```
The second line will error out though:
```
>>> file.write('Name: %s - %s\n' %(Name, type(Name)))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 7: ordinal not in range(128)
```
Which is entirely what you'd expect, trying to write non-ASCII Unicode characters to a byte stream. If you use Jiri's suggestion of a codecs-wrapped stream you can write Unicode directly, otherwise you will have to re-encode the Unicode string into bytes manually.

Better, for logging purposes, would be simply to spit out a repr() of the variable. Then you don't have to worry about Unicode characters being in there, or newlines or other unwanted characters:
```
name= 'w\xc3\xb6rner'.decode('utf-8')
file.write('Name: %r\n' % name)

Name: u'w\xf6rner'
```

How to get whole text of an Element in xml.minidom?

I want to get the whole text of an Element to parse some xhtml:

<div id='asd'>
  <pre>skdsk</pre>
</div>

begin E = div element on the above example, I want to get

<pre>skdsk</pre>

How?

From stackoverflow

Strictly speaking:
```
from xml.dom.minidom import parse, parseString
tree = parseString("<div id='asd'><pre>skdsk</pre></div>")
root = tree.firstChild
node = root.childNodes[0]
print node.toxml()
```
In practice, though, I'd recommend looking at the http://www.crummy.com/software/BeautifulSoup/ library. Finding the right childNode in an xhtml document, and skipping "whitespace nodes" is a pain. BeautifulSoup is a robust html/xhtml parser with fantastic tree-search capacilities.

Edit: The example above compresses the HTML into one string. If you use the HTML as in the question, the line breaks and so-forth will generate "whitespace" nodes, so the node you want won't be at childNodes[0].

Localization Problem Decimal Seperator

Is it possible that .NET uses on server A ',' as decimal seperator and on another server B '.'? + How can you detect this?

When converting strings to doubles, on server A everything works fine, but on server B we have problems.

Example:

server A : 20,4 --> 20.4 server B : 20,4 --> 204

We would need to detect this so that on both servers things keep on working.

thx, Lieven Cardoen

From stackoverflow

Sounds like the locale is being set correctly on one server but not the other.

There are a few ways you could do this, but you might want to set the locale for the current thread in your App/ASP.NET page using Thread.CurrentThread.CurrentCulture and Thread.CurrentThread.CurrentUICulture.

To set this for the entire application, you'd do this in your web.config:
```
<configuration>
  <system.web>
    <globalization culture="en-US" uiCulture="en-US" />
...
```
Lieven Cardoen : Can you set this for the overall asp.net application instead of for each thread separately?

Eric Petroelje : Yup! Updated my answer.

Eric Petroelje : I guess in your case it might be either "nl-BE" or "fr-BE", but you get the idea.
I use
```
String.Format(System.Globalization.CultureInfo.InvariantCulture, ....)
```
in those cases where you do not want to have culture specific output. (For example, when you write serialize data for other applications etc.) This way no matter what Culture is currently set, the output will always be the same.

My team member added a reference to a third party DLL and did a check-in...now I can't build.

My team member made a reference to a DLL on his local drive (e.g., C:\mystuff\thirdparty.dll) and checked in the project with the reference. I refreshed my local copy of the project and now I have a broken reference: <The system cannot find the reference specified>. I have the thirdparty.dll on my local machine, but it is not in the same directory my team member had it.

Should we be checking in thirdparty.dll itself into source control under our solution in a subdirectory using the relative path? When another developer gets the project, will it pull down the DLL and work automatically even if the developer didn't "install" the DLL beforehand?

Thanks.

From stackoverflow

If you create a "lib" or "assemblies" directory under the solution, or at least in the source control project, and add all 3d party assemblies and reference them there, it should eliminate many problems like yours.

Generally, you won't have to use the 3d party installer, but that will depend on the product.
See the answer I just gave to this question:

http://stackoverflow.com/questions/666597/tfs-c-vs2008-how-to-deal-with-external-assemblies/666623#666623

Basically, you need to add the DLL into source control, and reference it using file reference, such that the relative file path to the dll is the same for all developers on your team.
From my experience the best way is to set some relative pathing to the referenced DLL in a common directory. For example, given the following structure:
- Big Project
  - Project 1
  - Project 2
  - ...
  - Project N
  - Bin
I have all my projects compile into the bin folder, and all projects reference their dependencies from the bin folder. Anyone can pull down the project to ANY location and compile the first time.

Required Field Validator, displaying on initial page load

I have a simple textbox with a required field validation control attached to end and then being displayed in a validation summary at that bottom of the page. Everything works great on it but the validation seems to fire on the page's initial load which obviously sets off the required validation and displays the error message.

How do I set this control to only validate after the form has been submitted?

From stackoverflow

It sounds like you have code in your page load like this:
```
if (!Page.IsValid()) //...
```
What you really want is this:
```
if (Page.IsPostBack && !Page.IsValid()) //...
```
Cerebrus : Good catch, Joel! That's most probably the reason.

How to retrieve data from the html table?

Hi, I will generate html table dynamically with some textbox and dropdownlist, user will enter their input in that. How to read these data in the controller?(MVC)

From stackoverflow

Hi Krish,

There is a relatively quick and easy way to get data from a well-formatted HTML table using the Microsoft .NET Framework Data Provider for OLE DB. Essentially, this allows you to specifiy a connection string which points at the page with your HTML table on it.

Here's an example connection string:

*Provider=Microsoft.Jet.OLEDB.4.0;Data Source=http://www.your_web_address.com/yourHTMLTablePage.htm;Extended Properties="HTML Import;HDR=YES;IMEX=1"*

You can the use the OleDbConnection and OleDbCommand objects to get to the data contained within the table. You can find these inside the namespace: System.Data.OleDb

I hope this helps?

/Richard.
This will generate 5 textboxes, i want to read the data entered in these text boxes when i click submit button.

Thanks in advance

qt design issue

i'm trying to design interface like this one http://www.softpedia.com/screenshots/FlashFXP_2.png

i'm using the QT design and programming with python well on the left it's a treeWidget but what is on the right side ? as everytime i change the cursor on the tree all widgets replace...

thanks :p

From stackoverflow

Use QStackedWidget. You insert several widgets which correspond to the pages. Changing the active item in tree should switch the active widget/page inside the stacked widget.

HP smart array p400 on ubuntu

I installed an HP smart array p400. and having getting the following problem, the system does the partioning and a full installation with out a glitch,however, once the server reboots it goes a a black screen with a blinking cursor.

does any have an idea of what could be causing this?

*no usb devices are plugged *raid controller its plugged on a raiser card but even wiht out it, does not work *I also tryed to install SUSE Server but it does not even see the harddrivers

From stackoverflow

Edit: Are you sure grub is setup on the correct drive in the bootsector? Try following these steps to ensure it's setup correctly and make sure your bios boots to the first drive.

It could be an X Server problem. When the grub menu loads hit [Esc] and select the Recovery Mode option. When that loads, select the "Attempt to fix the X server" (or similar) option.

Are there any noticeable errors in dmesg? One of the things that bothers me about Ubuntu is there's no clear/concise boot logs like some other distros have.

Also, not programming related. But figured I would help you out.

Sergio Rodriguez : grub menu does not load; it stay on a black screen with bliking cursor
linux did not like the 8Tb partition. I had to repartition in to a smaller size volume for the root and the rest was mounted as /var

Importing explicitly instantiated template class from dll.

Being a dll newbie I have to ask the allmighty SO about something.

Say I explicitly instantiate a template class like this:

template class __declspec(dllexport) B<int>;

How do I use import this templated class again?

I've tried the adding the code below in my .cpp file where I want to use B

template class __declspec(dllimport) B<int>;

From stackoverflow

When you instantiate a template fully -- you have a complete type. It is no different from any other types. You need to include the header for B and also compile-time linking in with a lib file or dynamically load the dll to link to the definition.

Have you read this article: http://support.microsoft.com/kb/168958 ?

Here's a brief summary of what I tested (and it worked):

Create a dummy DLL project
- Used the Win32 Console application wizard to generate the dll header/source files called: template_export_test
- Added the following:
file: template_export_test.h
```
#ifndef EXP_STL
#define EXP_STL
#endif 

#ifdef EXP_STL
#    define DECLSPECIFIER __declspec(dllexport)
#    define EXPIMP_TEMPLATE
#else
#    define DECLSPECIFIER __declspec(dllimport)
#    define EXPIMP_TEMPLATE extern
#endif

EXPIMP_TEMPLATE template class DECLSPECIFIER CdllTest<int>;
```
file: template_export_test.cpp
```
template<class T>
CdllTest<T>::CdllTest(T t)
: _t(t)
{
    std::cout << _t << ": init\n";
}
```
Create the test application
- Use the wizard to create a Win32 Console application called: driver
- Edit the Linker project settings of this project:
  - Add to Linker > General > Additional Library Directories: path to template_export_test.lib
  - Add to Linker > Input > Additional Dependencies: template_export_test.lib
- Include the template_export_test.h in the main cpp file
```
#include "c:\Documents and Settings\...\template_export_test.h"
using namespace std;

int main(int argc, char** argv) {
    CdllTest<int> c(12);
}
```
- Compile and go!
Nailer : I've tried : extern template class __declspec(dllimport) B; Didn't work.
It seems, even with explicit instatiation of the template, problems may arise that lead to run-time errors. Take a look at this interesting article to C4251 (especially the "Conclusion").

Query about working out whether number is a power of 2

Using the classic code snippet:

if (x & (x-1)) == 0

If the answer is 1, then it is false and not a power of 2. However, working on 5 (not a power of 2) and 4 results in:

0001 1111 0001 1111 0000 1111

That's 4 1s.

Working on 8 and 7:

1111 1111 0111 1111

0111 1111

The 0 is first, but we have 4.

In this link (http://www.exploringbinary.com/ten-ways-to-check-if-an-integer-is-a-power-of-two-in-c/) for both cases, the answer starts with 0 and there is a variable number of 0s/1s. How does this answer whether the number is a power of 2?

From stackoverflow

You need refresh yourself on how binary works. 5 is not represented as 0001 1111 (5 bits on), it's represented as 0000 0101 (2^2 + 2^0), and 4 is likewise not 0000 1111 (4 bits on) but rather 0000 0100 (2^2). The numbers you wrote are actually in unary.

Wikipedia, as usual, has a pretty thorough overview.

dotnetdev : Good link. That helps me understand how the solution to this problem works. :)
Any power of two number can be represent in binary with a single 1 and multiple 0s.
```
eg. 
10000(16) 
1000(8) 
100(4)
```
If you subtract 1 from any power of two number, you will get all 1s to the right of where the original one was.
```
10000(16) - 1 = 01111(15)
```
ANDing these two numbers will give you 0 every time.

In the case of a non-power of two number, subtracting one will leave at least one "1" unchanged somewhere in the number like:
```
10010(18) - 1 = 10001(17)
```
ANDing these two will result in
```
10000(16) != 0
```
Keep in mind that if x is a power of 2, there is exactly 1 bit set. Subtract 1, and you know two things: the resulting value is not a power of two, and the bit that was set is no longer set. So, when you do a bitwise and &, every bit that was set in x is not unset, and all the bits in (x-1) that are set must be matched against bits not set in x. So the and of each bit is always 0.

In other words, for any bit pattern, you are guaranteed that (x&(x-1)) is zero.

How can I tell a database to only return data in discrete portions?

I couldn't find a proper discussion thread on this topic, so I'm going to go ahead and ask here.

Problem: I have a select query that returns a result of size 100,000+. The user wants to view all this data, but obviously I can't give it to him all at once. I also don't want to store so much data on the client's memory. I want the user to be able to "page through" the results, being able to view the data in pages of 500 records.

So, how can I ask the database to only send me back 500 records at a time?

From stackoverflow

You can use a combination of rownumber and top (atleast in SQL server)
This depends on the database you are using, but in MySql you could try something like:
```
SELECT * FROM MyTable LIMIT <start>, 500
```
and replace <start> with the index you would like to start on (e.g. 0 for the first page, 501 for the second page).

Is there any reason to write terse code in Java/C#/C++?

Did you ever find yourself writing terse code in Java, C# or C++?

If so, why? Do you think there are any situations in which this should be acceptable, given the situations in which these languages are used?

From stackoverflow

Barring all other concerns, shorter code is better, since you can see more of it at once.
Try to always write clear code. Sometimes clear code is terse, but often it's not.

Why? Because in 6 months' time you'll need to understand what you were trying to achieve. The faster you can do that, the better.
It depends on your definition of 'terse'.

If you mean 'short and to the point', it closely matches my vision for good code.

If you mean 'cryptic', then there's a problem.

Richard : +1: Agree 200% on the matches good code.
It depends on what exactly you mean by "terse". I certainly like to write concise code which expresses exactly what I want to achieve in the simplest possible manner. For example, I love the way LINQ lets me express a data pipeline rather than the "old" way of writing loops to transform or filter collections, or find the largest value etc. That's just duplicate code which should be in a template method somewhere.

On the other hand, shorter code isn't always more readable than longer code. The conditional operator is the subject of controversy on this front. Is:
```
Foo x = null;
if (condition)
{
    x = y;
}
else
{
    x = z;
}
```
more or less readable than:
```
Foo x = condition ? y : z;
```
Well, if "condition", "y" and "z" are all fairly simple, the conditional operator wins. If you have to go through hoops to make "y" and "z" single expressions where executing multiple statements would be more readable, then the if/else form is likely to be more readable.

In short, I write the most readable code I can. That's often, but not always, terse code.

Mario Ortegón : almost, but not quite, entirely unlike terse code?

Richard : /me checks watch... time for a cuppa.
Assuming you're using the term "terse" with the "cryptic" connotation:

Unless it's an obfuscated coding contest, I see no point in writing terse code in a compiled language. I might have written terse C++ code only in my own private projects. Never in code that someone else is going to see.

Otherwise, terse (in the "to-the-point" sense) code is better than verbose code.
Remember that code is read more often than it's written, and keep your readers in mind when writing (the reader could even be you). Don't write code like you assume the reader is stupid, nor write code that assumes that the less of it there is, the better it is.

Write 'short and to the point' like Joel Coehoorn suggests.
@j_random_hacker (can't add comments to a comment yet)

It happened to me that after 6 months I find it hard to decipher a piece of code I wrote. So indeed, this part matters also

Mario Ortegón : "Who wrote this unreadable piece of crap"... (checks source control) "Upps, it was me...."

Adrian Pascalin : true, true... I've heard of a SVN function called 'blame', didn't use it up to now, which does exactly this thing.
When writing the actual source code, be as robust as possible (without sacrificing performance). IE:
```
var result = GetResultFromFoo(GetOtherResult()).DoSomeCalculation(CallAnotherMethod());
```
could be fun to write, but good luck debuggin that sucker. There's no benefit, just break it down to this;
```
var result = GetOthereEsult();
var fooResult = GetResultFromFoo(result);
var anotherMethodResult = CallAnotherMethod();
var finalResult = fooResult.DoSomeCalculation(anotherMethodResult);
```
Much easier to debug.

The only time I can see a reason to write as terse code as possible, is if the size of the source code matters (like in a JavaScript file that's being served up hundreds of times a second) or when purposely trying to obfuscate code, however there are usualy software out there that does that for you. So bottom line, IMO, no there really isn't ever much of a reason for it.
Code should be as terse as necessary and no more. :)

Flippant remarks aside there are several factors affecting just how terse (or otherwise) it should be:
- Lifespan.
  - Often longer than you think :)
- Probability of bugs.
  - How likely is dependent on many things but the original coder has a big part to play in this.
- Computers reading (parsing) it.
  - Not just the compiler, think intellisense and the like.
- Humans reading it
  - sometimes raw, sometimes with the help of programs like diff tools.
  - sometimes not the same person that wrote it.
- Required performance characteristics.
All these things combine to produce a set of sometimes competing forces which may want more or less verbosity.

Balancing these is the key to effective development. Which are more important is totally dependent on the problem your software is trying to solve.

First off lets take the easy bits:

Computers.

When they read your code they are quite capable of doing so irrespective of the verbosity. They might be a little slower but this is something that is normally hard to measure (it is unlikely you will go beyond 1 or two orders of magnitude of verbosity than the minimum theoretical possibility). Notable exceptions are where you are (ab)using something like meta programming via a preprocessor to do lots of expansion for you. This can take a long time when compiling. Here you must decide if this trade off is worth it.

Humans.

Generally they will be people with similar context to you and they will be reading the source in a similar situation to when you wrote it. By this it means that if the function was in a file/class/module called Foo then there is no need to go putting Foo in front of things, the Foo aspect of it should be quite clear from context. This makes changing this aspect easier in future.

Programmers familiar with the idioms of the language/style of programming you are using will be quite capable of understanding several constructs which are extremely terse. Loop index variables called 'i' for example are as terse as you can get but are normally not a problem until your loop becomes large.
Here you see an interesting dichotomy. The value of terseness is often proportional to the complexity of the block of code within which it resides. As this block becomes more terse the variables within it benefit more from being shrunk. By writing code in functions/classes with limited responsibility it becomes easier and more helpful to keep things terse as there is less scope for confusion on the part of a human. Paradoxically this can lead to the need for the context to be more explicit, thus longer method and class names.

Lifespan

The lifespan and probability of bugs factor into how often you will have to either read the code or debug through it. Many debuggers support break points at multiple points on a line (correctly spotting where there are two statements) but some do not. Therefore care should be taken on if you intend to break point within it a lot to make sure you can place and control these with minimal effort.

If the code has a low probability of bugs but a long lifespan you have another interesting situation. The probability of the code being comprehensible when you come to need to change it is much lower (you will have a worse memory or may not even be there any more). This code therefore will benefit from being slightly less terse than normal.

Performance

On occasion you might have to sacrifice a compact but clear representation of something to satisfy a performance goal, perhaps you must bit pack for example, never a nice thing to read in code but unavoidable if you have to fit in a certain amount of memory. Occasions like these are hopefully rare.

General Concepts

Some language constructs can encourage terse code (automatic properties, anonymous inner classes, lambdas to name but a few). Where these concepts make sense to use use them judiciously. The idea is that they reduce boiler plate and expose intent.

If you do the same thing repeatedly and have a certain amount of code duplication consider a shared function/class/module but remember that if you must make the shared code less clear (say an additional if statement or unused variables in one of the code paths) then you may not have a net win.

Type inference is powerful but remember that the compiler is sometimes much better at it than a human. If you are saying flibble x = new flibble() then var x = new flibble() is no stretch at all (and gets better as flibble gets bigger). Compare with var flibble = SomeMethodWhoseReturnTypeIsNotClear(). Common sense helps here, if you would never have to use intellisense to work it out you certainly should consider using it.

Some other useful (and terse) rules of thumb:
- Multiple actions on a single line often confuse humans.
- Side effects often confuse humans (++x or x++ don't matter at all conceptually unless part of a wider expression for example)
- Indentation helps most humans to infer structure far more than brackets do
- Precedence rules are often 'internalized' by humans rather than remembered as a set of rules. This means that needless (to the compiler) bracketing can be harmful for readability where the idiom is common but useful where the usage is not so common.
- Logical not is often a single character. When using this in an if statement consider whether it can be restructured to not require it. This may not be possible (if the resulting contortions to variable/method name or ordering of code outweigh the removal leave it in)
David Berger : +1 for not being terse.

Yuval A : vote++ and accepted, great answer :)

ShuggyCoUk : @David Good point :)

TofuBeer : It was as long as it needed to be :-P
There is one mechanical disadvantage to verbose code and naming in java: Memory footprint under certain JVMs. This may only be a perception issue around process virtual size and memory mapped jar files, but it does exist. Big jar files == big (perceived) memory usage.

You probably have to take things to extremes for this to be measurable in regards to terseness, but it is an interesting "real" effect.

In terms of actual advice, like others have said, write good code first, and worry about optimization later.

David Thornley : Are there implementations where writing verbose source code will translate into significantly larger .jar files than terse code doing the same thing?

TREE : Significantly? probably not, as I tried to imply. Measurably? sure.
I'm a huge fan of code that I can read. I've seen some "terse" code that looks like:
```
int af = yb; 
while(af<ex){af++;af*=4;}
```
It's easy to see what is being done programatically, but WHAT is actually being done in terms of meaning is obscure. I'd much rather have variable names that I can read later, than trying to save a couple characters of code and going with terse names. Good code does not always mean shortest. Good code is about good algorithms, good documentation, good maintainability. I don't care how long the code is, if it has all those properties, it's good code.
I have found that readability to a human is the most important feature for code in the LONG run.

Readability includes being concise, accurate and clearminded. If you mix things together, you confuse your reader. If your code is clear to read you don't need many comments. If you break complex things together in local, well named variables, you help the reader immensely while implicitly documenting what you do.

Well-written code will survive you. Strive for perfection :)

How to run Eclipse launch configurations programmatically?

I'm finding it difficult to phrase this question well, as there are quite a few generic terms (run, configuration, launch, etc.). Here goes:

You can save run configurations in a .launch file. (in the Run Configuration Dialog, under the Common tab, Save as a shared file.

We check these in to SVN. The developers can pass them around, and it helps getting new devs running a working application quicker.

I'd like to check these out as part of our build and use them to programatically run the application, the tests, etc, without spinning up the whole IDE.

What would be the best way to run a .launch file outside of the UI?

Edit: I am trying to unify the tests run on the build server and the IDE. I do not particularly want to give up integrated debugging, which would be the case with an ant script to run the tests .

This is probably more a problem for integration testing with multiple bundles, or unit testing a whole bundle, where you'd like to mock up extensions.

From stackoverflow

Ant4Eclipse may provide a good starting point on how to do this.

Unfortunately, this is limited to Java Applications and JUnit configurations; I am more interested in PDE applications and Plugin JUnit tests.
there is an eclipse plugin built over JUnit, called TPTP. It provides an automation client which can be used to launch the test from eclipse with no gui. maybe it helps
I think you don't need to use the .launch configurations to run the tests. If you build an application using the Eclipse Build System, then you can use the AntRunner application from Eclipse to run your units tests. This doesn't start the whole IDE.

This article describes how to run the tests during your build process. With this process, you use a special "Test" Eclipse and load the plugins you want to test.

jamesh : The problem I'm trying to avoid is that of keeping the build/test/deploy metadata all in synch. If I can get everything running from the same set of metadata...
Does anyone have a solution to this question ?

I tried with ant4eclipse but I was unable to run anything.
Perhaps running the configurations the way you would run your own custom run configurations would help here. It is described in this article.

jamesh : This article allows us to define our own launch configuration types. We want to be able run existing configurations. Unfortunately, these launch configuration types (plus the framework itself) are all in org.eclipse.jdt.debug.ui, so not really suitable for a headless build server.

How to check what URL a RedirectToRouteResult will make?

In my asp.net-mvc project I do a redirect from a post request to a get request.
In between my redirect and my arrival of the method I expect it to arrive, one of my parameters magically turns into null and I can't figure out why.
Probably it has something to do with my global.asax (route defenition).

The only way I can come up with to debug this is with the route debugger library. But I don't see how I can use it with a RedirectToRoute.
Any suggestions?

From stackoverflow

A redirect is a result sent to the browser, and then the browser honors the redirect by doing a GET on the new URL. Therefore, look at the browser to see what the URL is. When the browser receives the redirect, it will do a GET on the new URL, which you can see with Firebug, Fiddler, or the tool of your choice.

Inside the new action, when it is called, you can also examine Request.Url.

WPF WrapPanel/StackPanel with DataTemplate?

Well the title says it all? what must i do to use a DataTemplate in a WrapPanel or StackPanel?

On a ListBox its so easy but i cant find a way to do it on a Panel...

Edit: What i want is a ListBox that places the items like a WrapPanel.

From stackoverflow

There isn't a DataTemplate property for those panels..

You can however use a ControlTemplate for these panels, and set these in the Template property of these panels...

HTH

If I understand you correctly, you can use the ItemsPanel property of your container. I used something similar to make my ItemsControl layout using a horizontal StackPanel:

<ItemsControl.ItemsPanel>
    <ItemsPanelTemplate>
        <StackPanel Orientation="Horizontal" />
    </ItemsPanelTemplate>
</ItemsControl.ItemsPanel>

So, more specifically for your case:

<ListBox.ItemsPanel>
    <ItemsPanelTemplate>
        <WrapPanel />
    </ItemsPanelTemplate>
</ListBox.ItemsPanel>

SELECT All that are not in another table

Hi All,

I've inherited a less-than-ideal table structure, and I'm trying to improve it as much as I can without tearing down and rebuilding. There's currently at least two levels of data for everything, the legacy data and the marketing override data. I'm trying to find all the records within the legacy data that don't yet have a marketing override.

So far, this is what I have:

SELECT DISTINCT 
  old.STYLE_NBR, old.COLOR_NBR 
FROM 
  LEGACY_PRODUCT_TABLE old
INNER JOIN 
  MARKETING_PRODUCT_TABLE new
ON old.STYLE_NBR <> new.style_number AND old.COLOR_NBR <> new.colour_number

This seems to work, but it takes a few minutes to run. If at all possible, I'd like a more efficient way of doing this.

Other info:

There are about 60,000 records in the legacy table
There are about 7,000 in the marketing table
Both STYLE_NBR and COLOR_NBR are char(5) and, when combined, make a unique ID.

From stackoverflow

Are the join fields indexed? That should speed things up considerably. Make sure old.STYLE_NBR, old.COLOR_NBR, new.style_number, and new.color_number are indexed.

I don't know if this will be faster, but it may be worth a try.

SELECT DISTINCT 
  old.STYLE_NBR, old.COLOR_NBR 
FROM 
  LEGACY_PRODUCT_TABLE old
WHERE old.STYLE_NBR, old.COLOR_NBR 
NOT IN 
(
    SELECT old.STYLE_NBR, old.COLOR_NBR 
    FROM LEGACY_PRODUCT_TABLE old
    INNER JOIN 
        MARKETING_PRODUCT_TABLE new
        ON 
            old.STYLE_NBR == new.style_number AND old.COLOR_NBR == new.colour_number
)

You should be using a LEFT OUTER JOIN and change your lookup

SELECT DISTINCT 
  old.STYLE_NBR, old.COLOR_NBR 
FROM 
  LEGACY_PRODUCT_TABLE old
  LEFT OUTER JOIN MARKETING_PRODUCT_TABLE new
    ON (old.STYLE_NBR + old.COLOR_NBR) = (new.style_number + new.Colour_number)
WHERE (new.style_number + new.Colour_number) IS NULL

SELECT 
    old.* 
FROM 
    LEGACY_PRODUCT_TABLE old 
LEFT JOIN
    MARKETING_PRODUCT_TABLE new 
ON 
    new.style_number=old.STYLE_NBR AND 
    new.colour_number=old.COLOR_NBR 
WHERE 
    new.style_number IS NULL;

Stands a better chance of using the indexes which you presumably have on the four columns in question.

What about NOT EXISTS?

SELECT DISTINCT old.STYLE_NBR, old.COLOR_NBR 
FROM LEGACY_PRODUCT_TABLE old
WHERE NOT EXISTS
    (SELECT 1 FROM MARKETING_PRODUCT_TABLE new 
    WHERE old.STYLE_NBR = new.style_number 
      AND old.COLOR_NBR = new.colour_number)

Some options to try are:

SELECT
    old.STYLE_NBR,
    old.COLOR_NBR
FROM  
    LEGACY_PRODUCT_TABLE old
LEFT OUTER JOIN
    MARKETING_PRODUCT_TABLE new
ON
    old.STYLE_NBR = new.style_number
AND
    old.COLOR_NBR = new.colour_number
WHERE
    new.style_number IS NULL



SELECT
    old.STYLE_NBR,
    old.COLOR_NBR
FROM  
    LEGACY_PRODUCT_TABLE old
WHERE
    NOT EXISTS
(
    SELECT
     *
    FROM
     MARKETING_PRODUCT_TABLE new
    WHERE
     old.STYLE_NBR = new.style_number
    AND
     old.COLOR_NBR = new.colour_number
)

EDIT: The key thing with both of these is that you are joining using = rather than <>.

What you currently have is incorrect because it will return a row for every row that doesn't match, so potentially 6999 rows in the result per row in the legacy table if there is marketing override, or 7000 if there isn't. the distinct will then discard the duplicates, but the result will be wrong because even if there is a marketing matching row, the non-matching ones will make sure the result set will include the ones where there is no row.

Try this instead:
```
select distinct style_nbr, color_nbr
 from legacy_product_table L
where not exists
(
   select * from marketing_product_table m
   where m.style_nbr = L.style_nbr and m.color_nbr = L.color_nbr
)
```
Make sure the product table has an index on (style_nbr,color_nbr).

HLGEM : I prefer the left join method, but I like the way you explained why the orginal was incorrect.
-- What about EXCEPT? (if this is SQL Server 2005 or 2008) select old.Style_NBR, Old.Color_NBR except select new.Style_NBR, new.Color_NBR

-- try the code below in mssql 2008

declare @Old table( Color_Nbr tinyint, Style_Nbr tinyint )

declare @New table ( Color_Nbr tinyint, Style_Nbr tinyint )

insert into @Old values (1,1), (2,2), (3,3), (4,4)

insert into @New values (1,1), (2,2), (3,3), (5,5)

select o.Color_Nbr, o.Style_Nbr from @Old o

except

select n.Color_Nbr, n.Style_Nbr from @New n

Thursday, April 14, 2011

Computers.

Humans.

Lifespan

Performance

General Concepts

Blog Archive