Thursday 14 August 2014

LoadSim

LoadSim is now finished.

LoadSim creates different types of synthetic workloads on a linux system. It consists of a script that calls upon different programs to perform the workloads.

Its dependencies are curl, and lookbusy (which can be downloaded from
http://devin.com/lookbusy/download/lookbusy-1.4.tar.gz ).

CPU and memory workload is provided by lookbusy, network workload is provided by curl and downloading and uploading a file to a ftp server, and the disk read and write is done by a C program that I have written.

CPU workload
Lookbusy looks in /proc/cpuinfo to attempt to find how many CPU cores the current system has. If more than 1 CPU is found, the process forks itself and each process performs its load on a separate CPU each. (The number of CPUs can be specified directly if needed).

The workload is given as a percentage of CPU to be used by all cores. The workload is then generated by performing many addition equations in succession. So this creates a workload, but then Lookbusy continuously monitors /proc/stat for changes in CPU usage, attempting to keep total system CPU load at the desired level, sleeping the process of calculations when it goes over the specified level, and resuming when it goes under.










Memory workload
Lookbusy is also used to perform the memory workload. The memory that is allocated is continually stirred to ensure it is actually allocated in the virtual memory system and imposes pressure on it to keep it resident. In other words, it keeps the memory in use, not just allocate it and let it sit there doing nothing. It has to do this because of the way Linux uses virtual memory. If the allocated memory is not continuously stirred, Linux marks it as low priority and may be discarded later if other needs arise.

The memory workload is generated by using the memcopy() function in C to copy bytes of data between memory blocks. It takes 3 arguments, destination, source and count. The arguments destination and source point to the destination and source memory blocks respectively. count specifies the number of bytes to be copied.

If, however, the memmove() function is available, it will use that. Memmove() does everything memcopy() does, but also handles overlapping memory blocks. With memcopym if the two blocks of memory overlap, the function might not operate properly --some of the data in source might be overwritten before being copied. Memmove() ensures that the source data in the overlapped region is copied before being overwritten.

However, in our case, we do not care if data is overwritten or not in the memory only that a workload is performed, so it matters little to me which one of these functions is used.

Similar to the CPU workload, lookbusy will also alternate between copying and sleeping to keep the workload at the desired level.

Network workload
Network workload was produced by a command line application called curl, which is written in C. Curl is used for transferring data to and from a server using various protocols. Both downloading and uploading is supported by it.

Curl uses a library called libcurl for all its functions. It is portable and works on alarge range of operating systems, and has its own API for other programs to call upon. I chose curl, as it was written in C and many Linux distributions come with it by default, That way, the dependencies are minimised.

Curl was chosen as it allowed the bandwidth to be limited during a transfer, with the limit rate option. This was used alongside a FTP server which I had setup to download and upload to. The virtual machines are loaded with some large garbage files to do this. Curl does not have resume download support, which means that each time we execute these commands, they start redownloading or reuploading the file, which is a good thing in my case.

The FTP server does not use encryption, as it would introduce extra unwanted CPU overhead into the virtual machines.

DISK READ WRITE DESCRIPTION
I also managed to get disk read and write working, however it is not very stable (more on this later).

For disk read, I had attempted to create a file with lines filled with randomly generated 1MB long strings on each line, approximately 2000 lines for a 2GB file. I then used fopen(), and fread() to read from the file (into the memory), telling it to read the specified amount of megabytes in the file, then waiting for 1 second (to create a disk read workload of megabytes per second).

Same principle was applied for disk write. A file was opened and read into memory, then the file was written from memory into a new file.

Disk read and write is not very stable when used together at speeds greater then ~13MB/s. Disk read slows down to about 2MB/s and disk write continues at its own workload.
-------------------------

The LoadSim Script
The LoadSim script calls upon these programs to generate a workload. It accepts and parses command line arguments given to it in order to tell these applications how much load it should perform.

I used getopts() to parse through the command line arguments. It calls upon lookbusy twice to separately perform CPU and memory workloads.

Curl is also called upon twice, first time which is to download a remote file, and 2nd time to upload a local file to the FTP server.

There was also a slight hitch with lookbusy.If you specified an amount of CPU or memory to be used, lookbusy will use that amount of CPU or memory, and not a create a workload UP TO that level. For example, if my system was already running with 300MB of memory in use, and I specified to lookbusy that I wanted a workload of 500MB, then lookbusy itself will end up using 500MB, and my total memory usage will be 800MB. I solved this in my script by calculating the current resource usage, and then subtracting it from the amount specified. that way, if my system was running with 300MB, and I specify a workload of 500MB, then lookbusy will perform a workload of 200MB to make a total of 500MB of memory used.
-----------------------

The scripts can be found on this link: