strings | Thomas Sampson

February 28, 2010
by tomtech999 Leave a comment

Working in C!

This last week I have been treated to working on a code base written entirely in C code, no C++, all C!

While this has been an interesting challenge there are definitely times when I really missed some of the features provided in C++ and realised how much the standard template library is taken for granted! Anyway, as usual I only really post things on here which are of use to myself as reference at a later date, or items I found it particularly hard to track down elsewhere on the web. This post falls into both categories, and being such a fundamental issue I’m surprised that it took so long to find a solution.

I was working on adding a small lightweight XML library to the existing C code base. I was continuously faced with the problem of requiring some form of expandable buffer, in most cases to hold c strings (null terminated character arrays). The root of the problem is that there is no string type in C, no vectors and no dynamic arrays of any kind. This means that whenever you want to use one of the string functions such as strcopy, strcat or sprintf, you are always required to know the size of the buffer upfront and allocate memory appropriately (freeing the memory when the buffer is disposed).

The main function I was concerned with was sprintf which I required in order to build up strings for the xml documents. The length of the strings produced by sprintf could be very varied and picking a fixed buffer size would in most cases be over the top, and in some cases dangerous as buffer overflows could potentially always occur, corrupting the heap. Luckily I discovered the following technique which allows you to first print the formatted string to a dummy file (no data is actually written to a file and no I/O occurs) using fprintf. Once the fprintf call has completed (assuming the format string / parameters were valid) you can then check the return value to find out exactly how many characters were written to the dummy buffer, providing you with the exact buffer size required for allocation!

Once a buffer of the correct size is allocated sprintf can then be used with the same paramaters to safely fill in the buffer. This method does have its drawbacks. I guess that there is extra overhead incurred in formatting the string twice and gaining a handle to the dummy file. However, as far as I can find this is the only way to do this properly without using fixed size buffers. One work of warning, this technique should work across platforms but “:NULL” should be replaced with “nul” on unix systems. A simple #ifdef UNIX should sort that out!

FILE* f = fopen(":NULL","w");

char* buffer = 0;

int n = fprintf(f,"hello %s", "world");

buffer = (char*)malloc(n);

sprintf(buffer, "hello %s", "world");

//use buffer

free(buffer);

November 27, 2008
by tomtech999

Python String Methods

A handy list of Python string methods

http://www.python.org/doc/2.3/lib/module-string.html

November 13, 2008
by tomtech999

Regular Expression Rules

Period / Full Stop

These are used in regular expressions to represent a wildcard, where a . can represent any character at all. For example

t...s

matches

trees
trams
teens

Important: The wildcard represents ONE character only, not an undefined number of characters, hence t…s would not match the word “tracks”.

Question Mark

The question mark is used to represent the presence of the previous character, or rather the meaninglessness of its presence.

For Example,

colou?r

Would match both the English and American spelling “colour” and “color”. The ? character signifies that the presence of the previous character is not necessary to form a match.

Asterisk

The asterisk character * is used to represent zero or more instances of the previous character. For example,

www\.my.*\.com

would match…

http://www.mypage.com

http://www.my.com

http://www.myspace.com

http://www.myownpersonalhomepage.com

Note: – In this example the character previous to the asterisk is a . meaning we will accept any number of wildcards. Backslashes are used simply to escape the . symbol which we need to represent as a literal string in our url.

Plus

The plus symbol + is very similar to the asterisk * symbol, however it will only match one or more instances of the previous character. Using the previous example..

www\.my.+\.com

would match..

http://www.myspace.com

http://www.mypage.com

etc

but would not match…

http://www.my.com

{N} Number

This method allows us to specify how many of the previous character we will match, For example

w{3}\.mysite\.com

will only match

http://www.mysite.com

and would not match

w.mysite.com or

ww.mysite.com

Ranges {min,max}

This method allows us to specify a minimum and maximum number of the previous characters that we will match, for example

10{1,3} years

Would match the following…

10 years

100 years

1000 years

but would not match

1 years

10000 years

100000 years

Summary

Quantifier	Description
?	Matches any preceding element 0 or 1 times.
*	Matches the preceding element 0 or more times.
+	Matches the preceding element 1 or more times.
{num}	Matches the preceding element num times.
{min, max}	Matches the preceding element at least min times, but not more than max times.

taken from http://docs.activestate.com/komodo/4.4/regex-intro.html

The | Symbol

Using the | symbol allows us to use or statements in our regular expression. Here is an example

Model (R|S)1000

would match

Model R1000

Model S1000

but would not match

Model X1000

Model Z1000

Grouping with ( )

Parentheses “()” are used to group characters and expressions within larger, more complex regular expressions. Quantifiers that immediately follow the group apply to the whole group. For example:

(abc){2,3}

matches

abcabc
abcabcabc

More rules can be found at http://docs.activestate.com/komodo/4.4/regex-intro.html

March 24, 2008
by tomtech999 1 Comment

C++ String to Uppercase

In my project today I was taking the name of a user and then usig that to generate a data file e.g. “fred.dat”. I decided that I needed to keep the same data file for each user, regardless of how they entered there name .This name was not a username and could be writtin in any mixture of upper / lower case.

To resolve this I decided to make the filenames for each player uppercase when saved and loaded, e.g. FRED.dat but I couldn’t find any easy way to do so. The toupper() function I am familiar with only works on characters so I designed the following simple function to convert a string to uppercase.

NB:- Here I use the string data type, not c string.The resultant string can ofcourse be converted to c string by adding the post function .c_str()

————————–

string toUpperString(string text)
{
string toReturn(text); // duplicate input for manipulation

for(int i=0;i<text.length();++i) // for each character in string
{
toReturn[i]=toupper(text[i]); // convert char to uppercase
}

return toReturn; // return in upper case
}

Please correct me if there is a simpler way to implement this but for now this works a treat.

	Sean on C++ Pound Sign
	David on Fixing Ubuntu/Xubuntu 14.04 sc…
	Nishant Modi on SQL Table of countries
	tomtech999 on Detecting first page load in…
	Hotmail Email direct… on Detecting first page load in…
	IhqTzup on Ubuntu – Add IP Address…
	tomtech999 on Shorthand IF statement in…
	Igor Podsekin on Getting Started With Mercurial…
	Haraprasad Ghosh on De Morgan’s Law
	Andrew Wiley on Debugging DirectX applications…