Sunday, February 22, 2009

Native Vector

I think the D Faq makes my point for me:
"the implementation of a string type in STL is over two thousand lines of code, using every advanced feature of templates. How much confidence can you have that this is all working correctly, how do you fix it if it is not, what do you do with the notoriously inscrutable error messages when there's an error using it, how can you be sure you are using it correctly"
Vectors are built in to D:

char str1[8]; // eight chars, enough to hold a string of length 7 and the 0 terminator
char str2[]; // a variable size string
char *str3; // a C style unknown length string


D strings are not NULL terminated (although string literals are given an extra NULL byte so that they can easily be used as C strings). This allows substrings to be referenced in situ. String length is determined using the vector length.

D is also able to avoid a pitfall of Java. When Java was developed, Unicode was all the rage. So, strings in Java are all Unicode (2 bytes). That means every string is taking up twice the memory of a C string.

D uses UTF-8 (which I've only learned of recently, Wikipedia says it was first presented in 1993 - Java was started in 1991). That means we can get all the juiciness of foreign characters sets (Hebrew and Greek being of interest to me lately), without the 2x memory penalty.

No comments: