Recent Question/Assignment
Objectives:
• Continue writing non-trivial C programs
• Exercise with process/thread creation and use
• Exercise with multi-threaded programming
• Exercise with the creation and use of shared memory as an IPC facility
• Exercise with process/thread synchronization using semaphores
• Exercise with reader/writer locks through the file locking mechanisms of Unix.
1 Project Description
In this project, you will design and implement a client-server based synchronized and parallel keyword search application where a single server will receive search requests including a directoryPath and a keyword from multiple clients, and will search the keyword in all the files located in the specified directory. Client and server will communicate through a shared memory region. Multiple client processes will be able to talk to the server at the same time. This project will enable you to practice multiple that you have learned so far in this course including POSIX shared memory, POSIX semaphores, readers/writers locks, forking child processes, and multi-threading using Pthreads.
1.1 Client Program
The client will issue search requests to the server. The name of the client program must be spks_client (synchronized parallel keyword search) and it will be invoked as follows:
./spks_client req-queue-size inputfile
The req-queue-size parameter specifies the capacity of the request queue that will be used between the client processes and the server process for communication purposes, and inputfile is the name of the input file including all the search requests to be directed to the server. The client process will get the search requests from the input file line by line (one request at a time) and will send these requests to the server through the request queue. The format of the search requests included in the input file will be as follows:
directoryPath keyword
You can make the following assumptions about the inputfile :
1. The directoryPath is a full path not exceeding the MAXDIRPATH characters
2. The keyword is a single word not exceeding the MAXKEYWORD characters, enclosed by whitespace characters not including any whitespace characters inside. Note that a whitespace character can be either a SPACE , or a TAB , or a NEWLINE character.
A sample input file content can be as follows:
/home/naltipar/dirOne is
/home/naltipar/dirOne a
/home/naltipar/dirTwo a
When the client process reads a search request from the inputfile , a request message will be generated and send to the server using the request queue implemented on the shared memory. Several client processes can send requests to the server using the same request queue. A special com- mand will be used to terminate the server. When the server receives this special search command, then it will terminate and will not be able to serve any further requests. This special command will only include -exit- in the message as follows:
exit
1.2 Server Program
The server will accept search requests from several clients and for each request, it will search the texttt keyword in all the files sitting in the given directoryPath using multiple threads. The server program must be named spks_server and it will be invoked as follows:
./spks_server req-queue-size buffersize
As in the client case, req-queue-size specifies the capacity of the request queue to be used between the server and the client processes. The same req-queue-size value should be passed to the server and the client for propery settings. buffersize specifies the size of the bounded buffer to be used between the threads inside the server (more info will be given later).
The server process should be started before the client processes. When started, the server will cre- ate a shared memory segment for IPC and some semaphores for mutual exclusion and synchronization purposes. The shared memory segment and the semaphores will be initialized by the server. Shared memory should be created with some name (as specified by the man page for shm_open()) that you will select and define in your program. The name will be given as a parameter to the shm_open() function that will be used to create a shared memory region. The clients will use the same name to access the shared memory. Since you are the programmer of both the client and the server, you will decide this name. Similar situation applies to the named semaphores that may be created and initialized by the server as well. You will decide which semaphores and how many semaphores you will use and how you initialize them. It depends on how you design your synchronization and critical section solutions. You will use POSIX semaphores.
The server will do some initialization on the shared memory segment. As part of this initialization, it can also create and initialize a shared array located in the shared memory segment previously cre- ated. This array will be used as the request queue and it can hold req-queue-size number of requests. Request queue can be accessed by all client processes and the server process. While trying to insert an item, if a client process finds the request queue full, then it should go into sleep (block) until there is space for one more request. Similarly, if there is no request in the queue, server process should go into sleep. Besides, since many processes access the request queue concurrently, the access must be coordinated. Otherwise we can have race conditions. You can use named semaphores to protect the request queue and synchronize the processes.
The server will perform the search requests received from the request queue as follows. First, it will immediately create a new child process to serve each request and the search information will be passed to this child process. The child process will then create a number of threads to serve the search request. Number of threads to be created by the child process depends on the number of files located inside the directoryPath . If there are N files inside the directoryPath , then the child process will create N + 1 threads: N worker threads and 1 printer thread. Each worker thread will be responsible from a different search file, it will scan the search file and look for the matching lines based on the keyword . Your program will only search the first level files in the given directoryPath . If the directoryPath includes any sub-directories, you should skip them without searching the files in these sub-directories.
A match in a line will be an exact match of an entire word only and the maximum length of a line in a search file will never exceed MAXLINESIZE. When a worker thread finds a match, it will prepare an item that includes the following information: the name of the file where the match has occurred, the line number of the match, and the line itself (i.e. the line string excluding the newline character at the end). If the keyword is seen, for example, in 5 separate lines, then 5 separate items will be created. If the keyword is seen more than once in the same line, then a single item will be created for that line. Whenever an item is created, then the worker thread will add this item to a memory buffer: a bounded buffer.
The second argument of the server ( buffersize ) specifies the size of the bounded buffer to be used by the threads. Each worker thread will add the produced items to this bounded buffer. The buffer can hold at most buffersize items. This bounded buffer can be implemented in one of two ways: 1) as a linked list of items; or 2) as a circular array of pointers to items. You can choose either one of these implementation options. A worker thread will add a new item to the end of the buffer. The buffer has to be accessed by all worker threads. While trying to insert an item, if a worker thread finds the buffer full, then the thread has to go into sleep (block) until there is space for one more item. Since all worker threads will access the buffer concurrently, the access must be coordinated. Otherwise we might end up with a corrupted buffer. You will use semaphores to protect the buffer and to synchronize the threads so that they can sleep and wake-up when necessary. Check the sem-pc.c example provided in the class website.
Beside the worker threads, printer thread will also access the bounded buffer. The job of the printer thread will be to retrieve the items from the bounded buffer and print them into an output file. While the items are added to the buffer by the worker threads, the printer thread can work concurrently and try to retrieve the items from the buffer and print them to the output file. The printer thread will try to retrieve one item from the bounded buffer at a time. If the buffer is empty, the printer thread has to go into sleep until the buffer has at least one item. When the printer thread is successful in retrieving an item from the buffer, it will print it to the output file in the following format:
filename:linenumber:linestring
For example, if the keyword is -a- and the content of the file to be searched (assume the file name is file1.txt) is as follows:
I raised my daughter in the American fashion; I gave her freedom, but taught her never to dishonor her family. She found a boy friend, not an Italian. She went to the movies with him, stayed out late. Two months ago he took her for a drive, with another boy friend. They made her drink whiskey and then they tried to take advantage of her. She resisted; she kept her honor. So they beat her like an animal. When I went to the hospital her nose was broken, her jaw was shattered and held together by wire, and she could not even weep because of the pain.
Then, the printer thread will print the following to the output file:
file1.txt:4:family. She found a boy friend, file1.txt:7:Two months ago he took her for a
The output file will be named as output.txt and all child processes (only the printer thread from each child process) will write their output into this single output file. Since the output file can be modified by multiple processes concurrently, you should also synchronize the access to this output file. For this purpose, you can use the fcntl system call available in Unix providing file locking mechanism. Note that advisory locking should be sufficient for your case and you can assume that the maximum length of an output line will never exceed MAXOUTSIZE. You do not need to worry about the ordering of the lines in the output file; before testing your output, we will sort the output file ourselves.
Pay attention to the following issues while developing your application:
• The output should not contain the name of the directory, just the name of the file.
• Only the lines that include the exact match of an entire word is printed.
• If the keyword is seen more than once in a line, then a single output will be printed for that line. For example, if we had a line as -a a a-, then the number of matches on this line would be 3 for the keyword -a-. However, we have to report such a line only once.
• For each search request received from the request queue, a specific bounded buffer, a specific printer thread, and bunch of worker threads (one for each file in the given directoryPath ) are created by the child process that is assigned for that search request.
• Your program will only search the first level files in the given directoryPath .
• All the worker threads for a given search request do the same thing on different files and fill the same bounded buffer.
• Your program passes each received search request to a child process and handle the next search request immediately without waiting for the created child to complete its execution. However, you should also make sure that your program does not terminate before printing all the outputs of the already received requests. To achieve this, you can count the number of children created by the parent and after handling all the requests (exit request is received), then the parent process can call wait() system call in a loop for count times. We will make sure that the exit request is sent as the last request in the queue.
Figure 1 plots the general structure of this application:
Worker
File
.. ..
Bounded Buffer
Request Queue
. . .
Printer . .
Client
Figure 1: System Structure
2 The Shared Memory Segment
All data between the clients and the server will be passed through the shared memory. You will use POSIX shared memory API. Some of the useful functions that are part of the POSIX shared memory
API are the following: shm_open(), mmap(), shm_unlink(), ftruncate(), fstat(). The size of the created shared memory should be determined by you considering the content to be kept in the shared memory (request queue, any variable to be stored in the shared memory for indexing the request queue if necessary etc).
Useful Clean-up Commands
• While testing your project, your server might create multiple child processes and terminate un- expectedly due to a segmentation fault. At that point, you might have several zombie processes existing in the system. In order to check current processes in the system including the -spks- name in it, you can run the following command:
– ps aux | grep spks
• If you want to kill a process by its name, you can run the following command:
– pkill -f name
• If you want to kill a process by its process id (PID), you can run the following command:
– kill -9 PID
• If you want to remove all zombie processes by killing their parents, you can run the following command:
– kill -HUP $(ps -A -ostat,ppid | awk '/[zZ]/{print $2}')
• Shared memory regions or named semaphores you created my stay in the system if your server is closed unexpectedly without unlinking these objects. If you do not remove these objects and re-run your program, then these regions will be reused and unexpected results will be produced. Therefore, you should clean these objects before re-running your programs. You can run the following code to see the existing shared memory/semaphore objects in the system:
– ls -al /run/shm or – ls -al /dev/shm
• If you want to clean the existing shared memory/semaphore objects, then run the provided clean_os.sh script.
3 Constants and Tips
• MAXDIRPATH will be 1024
• MAXKEYWORD will be 256
• MAXLINESIZE will be 1024
• MAXOUTSIZE will be 2048
• Remember to use thread-safe library functions inside your threads! (for instance strtok_r() instead of strtok()).
• Named Posix semaphores can be shared among processes (clients and server).
• While testing your application, remember to clean the previously opened shared memory and semaphores before restarting your application. When your application did not terminate prop- erly, these regions could be left open without being unlinked.
4 Development
You will develop your program in a Unix environment using the C programming language. gcc must be used as the compiler. You will be provided a Makefile and your program should compile without any errors/warnings using this Makefile. Black-box testing will be applied and your program’s output will be compared to the correct output. A black-box testing script will be provided on the class website, make sure that your program produces the success messages in that test. A more complicated test (possibly more than one test) might be applied to grade your program. Submissions not following the specified rules here will be penalized.
5 Submission
Memory leaks and compilation warnings will be penalized if any. You can check the compilation warnings using the -Wall flag of gcc.
5.1 What to Submit
1. spks_client.c including the source code of the client.
2. spks_server.c including the source code of the server.
3. README.txt including the following information:
• Did you fully implement the project as described? If not, which parts are not implemented at all or not implemented as following the specified description? Note that for this project, it is very easy to print out the required output to the screen without using any shared memory, bounded buffers, multi-threading, or file locking mechanisms. Implementing the project without following the specified description will be considered as plagiarism if not properly mention in this README.txt file.
• Structure of your shared memory region. Which size did you use and why? What did you store to the shared memory in which order?