Saturday, February 12, 2011

Ordering numbers that are stored as strings in the database

I have a bunch of records in several tables in a database that have a "process number" field, that's basically a number, but I have to store it as a string both because of some legacy data that has stuff like "89a" as a number and some numbering system that requires that process numbers be represented as number/year.

The problem arises when I try to order the processes by number. I get stuff like:

  • 1
  • 10
  • 11
  • 12

And the other problem is when I need to add a new process. The new process' number should be the biggest existing number incremented by one, and for that I would need a way to order the existing records by number.

Any suggestions?

  • Maybe this will help.

    Essentially:

    SELECT process_order FROM your_table ORDER BY process_order + 0 ASC
    
  • Can you store the numbers as zero padded values? That is, 01, 10, 11, 12?

    Farinha : Hhmm, I guess so. That might be a solution...
    From Andrew
  • I would suggest to create a new numeric field used only for ordering and update it from a trigger.

    From Panos
  • You need to cast your field as you're selecting. I'm basing this syntax on MySQL - but the idea's the same:

    select * from table order by cast(field AS UNSIGNED);
    

    Of course UNSIGNED could be SIGNED if required.

    Farinha : That breaks when I have "numbers" like "62a" and "12/98", like I explained in the question.
    From Remy Sharp
  • Can you split the data into two fields?

    Store the 'process number' as an int and the 'process subtype' as a string.

    That way:

    • you can easily get the MAX processNumber - and increment it when you need to generate a new number
    • you can ORDER BY processNumber ASC, processSubtype ASC - to get the correct order, even if multiple records have the same base number with different years/letters appended
    • when you need the 'full' number you can just concatenate the two fields

    Would that do what you need?

  • Given that your process numbers don't seem to follow any fixed patterns (from your question and comments), can you construct/maintain a process number table that has two fields:

    create table process_ordering ( processNumber varchar(N), processOrder int )
    

    Then select all the process numbers from your tables and insert into the process number table. Set the ordering however you want based on the (varying) process number formats. Join on this table, order by processOrder and select all fields from the other table. Index this table on processNumber to make the join fast.

    select my_processes.*
    from my_processes
      inner join process_ordering on my_process.processNumber = process_ordering.processNumber
    order by process_ordering.processOrder
    
    From tvanfosson
  • It seems to me that you have two tasks here.

    • Convert the strings to numbers by legacy format/strip off the junk
    • Order the numbers

    If you have a practical way of introducing string-parsing regular expressions into your process (and your issue has enough volume to be worth the effort), then I'd

    • Create a reference table such as
    
    CREATE TABLE tblLegacyFormatRegularExpressionMaster(
        LegacyFormatId int,
        LegacyFormatName varchar(50),
        RegularExpression varchar(max)
    )
    
    • Then, with a way of invoking the regular expressions, such as the CLR integration in SQL Server 2005 and above (the .NET Common Language Runtime integration to allow calls to compiled .NET methods from within SQL Server as ordinary (Microsoft extended) T-SQL, then you should be able to solve your problem.

            • See
              http://www.codeproject.com/KB/string/SqlRegEx.aspx


    I apologize if this is way too much overhead for your problem at hand.

  • Suggestion:

    • Make your column a fixed width text (i.e. CHAR rather than VARCHAR).

    • Pad the existing values with enough leading zeros to fill each column and a trailing space(s) where the values do not end in 'a' (or whatever). • Add a CHECK constraint (or equivalent) to ensure new values conform to the pattern e.g. something like

    CHECK (process_number LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][ab ]')
    

    • In your insert/update stored procedures (or equivalent), pad any incoming values to fit the pattern.

    • Remove the leading/trailing zeros/spaces as appropriate when displaying the values to humans.

    Another advantage of this approach is that the incoming values '1', '01', '001', etc would all be considered to be the same value and could be covered by a simple unique constraint in the DBMS.

    BTW I like the idea of splitting the trailing 'a' (or whatever) into a separate column, however I got the impression the data element in question is an identifier and therefore would not be appropriate to split it.

    From onedaywhen

0 comments:

Post a Comment