Types
Casting is a way to convert a value of one type to a compatible type. Sometimes the conversion is exact; other times information is lost. Enumeration and parsing in q also fit into the cast pattern.
The type Operator
The non-atomic unary function (type
) can be applied to any entity in q to return its data type expressed as a short. It is a “feature” of q that the data type of an atom is negative whereas the type of a simple list is positive.
Q)type 42-7h q)type 10 20 30 7h q)type 98.6-9h q)type 1.1 2.2 3.3 9h q)type 'a-11h q)type' a 'b' C 11h q)type "z" -10h q)type "abc" 10hCopy the code
Type of a Variable
The type of a variable is the type of the value associated with the variable’s name.
q)a:42
q)type a
-7h
q)a:"abc"
q)type a
10h
q)value `.
a| 42
Copy the code
Cast
Since q is dynamically typed, casting occurs at run-time using the binary operator $
, which is atomic in both operands. The right operand is the source value and the left operand specifies the target type. There are three ways to spec
- A (positive) numeric short type value
- A char type value
- A type name symbol
Casts that Widen
In these examples, no information is lost in the cast, as the target type is wider than the source type. Here are examples using the short type specification in the target.
q)7h$42i / int to long
42
q)6h$42 / long to int
42i
q)9h$42 / long to float
42f
Copy the code
It is arguably most readable to use the symbolic type name.
q)`int$42
42i
q)`long$42i
42
q)`float$42
42f
Copy the code
Casts across Disparate Types
The underlying value of a char is its position in the ASCII collation sequence, so we can cast char to and from integers, provided the integer is less than 256.
q)`char$42
"*"
q)`long$"\n"
10
Copy the code
The underlying value of a date is its count of days from the millennium, so we can cast to and from anint
.
Q) 'date$0 2000.01.01 Q)' int$2001.01.01 / Millennium Occurred on Leap Year 366Copy the code
Casts that Narrow
Some casts lose information. This includes the usual suspects of float to integer and wider integers to narrower ones.
Q) ` long 12 $12.345 q) ` short $123456789, 32767 hCopy the code
Cast any numeric to a boolean using the C philosophy that zero is 0b
and anything else is 1b
.
We can also extract constituents from complex types.
Q) ` date $2015.01.02 D10:20:30. 123456789 2015.01.02 q) ` year $2015 I q) ` 2015.01.02 month $$2015.01.02 ` 2015.01.02 2015.01 m q) mm 1i q) 'dd$2015.01.02 2i q)' hh$10:20:30.123456789 10i q) 'minute$10:20:30.123456789 10:20q)' uU $10:20:30.123456789 20i Q) ` second $10:20:30. 123456789 10:20:30 q) $10:20:30 ` ss. 123456789 30 ICopy the code
Casting Integral Infinities
When integral infinities are cast to integers of wider type, they are their underlying bit patterns, reinterpreted. Since these bit patterns are legitimate values for the wider type, the cast results in a finite value.
q)`int$0Wh
32767i
q)`int$-0Wh
-32767i
q)`long$0Wi
2147483647
q)`long$-0Wi
-2147483647
Copy the code
Coercing Types
q)L:10 20 30 40
q)L[1]:42h
'type
Copy the code
This situation can arise when the list and the assignment value are created dynamically. Coerce the type by casting it to that of the target, provided of course that the cast is legitimate.
q)L[1]:(type L)$42h
q)L,:(type L)$43h
q)L
10 20 30
Copy the code
Cast is Atomic
Cast is atomic in the right operand.
q)"i"$10 20 30
10 20 30i
q)`float$(42j; 42i; 42j)
42 42 42f
Copy the code
Cast is atomic in the left operand.
Q) 'short' int 'long$42 42h 42i 42 q)"ijf"$98.6 99i 99 98.6Copy the code
Cast is atomic in both operands simultaneously.
q)"ijf"$10 20 30
10i
20
30f
Copy the code
Data to and from Text
A q string is a simple list of char.
Data to Strings
The function string
can be applied to any q entity to produce a text representation suitable for console display or storage in a file. Here are the key features of string
.
-
The result is always a list of char, never a single char. Thus you will see singleton char lists from single digits.
-
The result contains no q type indicators or other decorations. In general, the result is the most compact representation of the input, which may not actually be convertible (i.e., parsed) back to the original value.
-
Applying
string
to an actual string (i.e., list of char) probably will not give you what you want.Q)string 42″ 42″ q)string 4,”4″ Q)string 42i “42” q)a:2.0 q)string a,”2″ q)f:{xx} q)string f “{xx}”
The string
function is clearly not atomic
q)string 1 2 3
,"1"
,"2"
,"3"
q)string "string"
,"s"
,"t"
,"r"
,"i"
,"n"
,"g"
q)string (1 2 3; 10 20 30)
,"1" ,"2" ,"3"
"10" "20" "30"
q)string `Life`the`Universe`and`Everything
"Life"
"the"
"Universe"
"and"
"Everything"
Copy the code
Creating Symbols from Strings
To cast a char or a string to a symbol, use `$
.
q)`$"abc"
`abc
q
q)`$"Hello World"
`Hello word
Copy the code
You can include any characters in a symbol this way but you may need to escape them into the string.
q)`$"Zaphod \"Z\""
`Zaphod "Z"
q)`$"Zaphod \n"
`Zaphod
Copy the code
The unary `$
is atomic and will thus convert an entire list (or column) of strings to symbols.
q)`$("Life";" the";" Universe";" and";" Everything") `Life`the`Universe`and`EverythingCopy the code
Parsing Data from Strings
The $
operator is overloaded to parse strings into data of any type exactly as the q interpreter does. This overload is invoked by using an uppercase type char as the target left operand and a string in the right operand. If the specified parse cannot be performed, a null of the target type is returned
Q) "J" $" 42 "42 q)" F "$" 42" 42 F q) "F" $" 42.0 "42 F q)" I "$" 42.0" 0 ni q) "I" $" "0 niCopy the code
Date parsing is flexible with respect to the format of the date.
Q) "D" $" 12.31.2014 2014.12.31 q) "D" $" "12-31-2014" 2014.12.31 q) "D" $" 12/31/2014 2014.12.31 q) "D" $" "12/1/2014 2014.12.01" Q) "D" $" 2014/12/31 2014.12.31"Copy the code
Creating Typed Empty Lists
Q) c1: ` float $() q) c1:98.6Copy the code
Notice that an operation that yields a simple list retains the type on an empty result.
q)0#10 20 30
`long$()
Copy the code
Enumerations
Traditional Enumerations
To begin, recall that in traditional languages, an enumerated type is a way of associating a series of names with a corresponding set of integral values. Often the sequence of numbers is consecutive and begins with 0. The association is usually given a name and represents a new type.
A traditional enumerated type serves multiple purposes.
- It allows a descriptive name to be used instead of an arbitrary number — e.g., ‘red’, ‘green’, ‘blue’ instead of 0, 1 and 2.
- It enables type checking to ensure that only permissible values are supplied – e.g., choosing a color name from a list instead of remembering its number is less prone to error.
- It provides namespacing, meaning the same name can be reused in different domains without fear of confusion — e.g., color.blue and note.blue (the flatted fifth).
There is also a subtler, more powerful use of enumerations: normalizing data.
Data Normalization
Broadly speaking, data normalization seeks to eliminate duplication, retaining only the minimum required data. In the archetypal example, suppose you know that you will have a list of text entries taken from a fixed and reasonably short set of values
v:`ccccccc`bbbbbbb`aaaaaaa`ccccccc`ccccccc`bbbbbbb u:distinct v u `ccccccc`bbbbbbb`aaaaaaa k:u? v k 0 1 2 0 0 1Copy the code
Enumerating Symbols
The process of converting a list of symbols to the equivalent list of indices described in the previous section is called enumeration in q.
It uses (yet another overload of) $
with the name of the variable holding the unique symbols as the left operand and a list of symbols drawn from that domain on the right.
q)`u$v
`u$`ccc`bbb`aaa`ccc`bbb`aaa
q)`int$(`u$v)
0 1 2 0 1 2i
Copy the code
Using Enumerated Symbols
We continue with the example of the previous section, renamed to use the standard sym domain.
q)sym:`g`aapl`msft`ibm q)v:1000000? sym q)ev:`sym$v q)ev `sym$`g`g`msft`aapl`msft`aapl`msft`ibm`msft`aapl`g`ibm`aapl`msft`msft`aapl`g`.. q)`int$ev 0 0 2 1 2 1 2 3 2 1 0 3 1 2 2 1 0 1 2 0 1 1 2 3 0 1 2 1 2 1 0 0 1 2 1 2 3 3 0..Copy the code
The enumerated ev
can be substituted for the original v
in nearly all situations.
q)v[3] `aapl q)ev[3] `u$`aapl q)v[3]:`ibm q)ev[3]:`ibm q)v=`ibm 000100010010011101000010010100000000100100000001000000001100001001011.. q)ev=`ibm q)where v=`aapl 4 5 19 20 21 31 33 34 41 42 43 49 58 59 61 74 81 83 90 94 95 98 114.. q)where ev=`aapl 4 5 19 20 21 31 33 34 41 42 43 49 58 59 61 74 81 83 90 94 95 98 114.. 000100010010011101000010010100000000100100000001000000001100001001011.. q)v? `aapl 4 q)ev? `aapl 4 q)v in `ibm`aapl 000111010010011101011110010100010110100101110001010000001111011001011.. q)ev in `ibm`aapl 000111010010011101011110010100010110100101110001010000001111011001011..Copy the code
While the enumerated version is item-wise equal to the original, the entities are not identical.
q)all v=ev
1b
q)v~ev
0b
Copy the code
This is because the types matter with ~
.
Type of Enumerations
Each enumeration is assigned a new numeric data type, beginning with 20h
. Starting with q version 3.2, the type 20h
is reserved for the conventional enumeration domain sym, whether you use it or not (you should). The types of other enumerations you create will begin with 21h
and proceed sequentially. The convention of negative type for atoms and positive type for simple lists still holds. In a fresh q session we see the following.
q)sym1:`g`aapl`msft`ibm q)type `sym1$1000000? sym1 21h q)sym2:`a`b`c q)type `sym2$`c -22hCopy the code
Updating an Enumerated List
The normalization provided by an enumeration reduces updating all occurrences of a given value to a single operation. This can have significant performance implications for large lists with many repetitions. Continuing with our example above, suppose the list u contains the items in a stock index and we wish to change one of the constituents. A single update to u suffices.
q)sym:`g`aapl`msft`ibm
q)ev:`sym$`g`g`msft`ibm`aapl`aapl`msft`ibm`msft`g`ibm`g..
q)sym[0]:`twit
q)sym
`twit`aapl`msft`ibm
q)ev
`sym$`twit`twit`msft`ibm`aapl`aapl`msft`ibm`msft`twit`ibm`twit..
Copy the code
In contrast, to make the equivalent update to v
requires changing every occurrence.
Q) v ` ` g g ` MSFT ` IBM ` aapl ` aapl ` ` MSFT MSFT ` IBM ` g ` IBM ` g... q)@[v; where v=`g; :; `twit]Copy the code
Dynamically Appending to an Enumeration Domain
One situation in which an enumeration is more complicated than working with the denormalized data is when you want to add a new value. Continuing with the example above, appending a new item to an ordinary list of symbols is a single operation.
q)sym:`g`aapl`msft`ibm q)v:1000000? sym q)ev:`sym$v q)v,:`twtr q)ev,:`twtr 'castCopy the code
The new value must first be added to the unique list.
q)sym,:`twtr
q)ev,:`twtr
Copy the code
Resolving an Enumeration
An enumerated symbol can be substituted for its equivalent symbol value in most expressions. However, there are some situations in which you need non-enumerated values. One case is converting from one enumeration domain to another, which happens when copying from one kdb+ database to another or in merging two databases.
Given an enumerated symbol, or a list of such, you can recover the un-enumerated value(s) by applying the built-in value
. In our on-going example,
q)sym:`g`aapl`msft`ibm q)v:1000000? sym q)ev:`sym$v q)value ev `aapl`g`msft`msft`ibm`msft`msft`msft`msft`msft`g`ibm`ibm`ibm.. q)v~value ev 1bCopy the code
Reference: code.kx.com/q4m3/7_Tran…
By Jeffry A. Borror