The input and output operations that we have performed so far were done through screen and keyboard
only. After termination of program, all the entered data is lost because primary memory is volatile.
When the data has to be used later, then it becomes necessary to keep it in permanent storage device. C
supports the concept of files through which data can be stored on the disk or secondary storage device.
The stored data can be read whenever required. A file is a collection of related data placed on the
disk.
The file handling in C can be broadly categorized in two types:
High level (standard files or stream oriented files)
Low level (system oriented files)
High level file handling is managed by library functions while low level file handling is managed by
system calls. The high level file handling is commonly used since it is easier to manage and hides most
of
the details from the programmer.
Text and Binary modes
There are two ways of storing data in files, binary format and text format.
In text format, data is stored as lines of characters with each line terminated by a newline
character('\n').
In binary format, data stored on the disk in the same way as it is represented in the computer
memory. Unix system do not make any distinction between text file and binary files.
Text files are in human readable form and they can be created and read using any text editor, while
binary files are not in human readable form and they can be created and read only by specific
program written for them. The binary data stored in the file can't be read using a text editor.
The integer 1679 will take only 2 bytes in a binary file while it will occupy 4 bytes in a text file
because in-binary file it is stored as an integer. while in text file it is stored as a sequence of
4 characters i.e. '1', '6', '7', '9'.
Concept of buffer
Buffer is an area in memory where the data is temporarily stored before being written to the file.
When we open a file, a buffer is automatically associated with its file pointer. Whatever data we
send to the file is not immediately written to the file. First it is sent to the buffer and when the
buffer is full then its contents are written to the file. When the file is closed, all the contents
of buffer are automatically written to the file even if the buffer is not full. This is called
flushing the buffer, we can also explicitly flush the buffer by a function fflush().
The concept of buffer is used to increase efficiency. If there had been no buffer we would have to
access the disk each time for writing even single byte of data. This would have taken lot of time
because each time the disk is accessed, the read/write head has to be repositioned. When buffering
is done, the data is collected in the buffer and data equal to the size of buffer is written to the
file at a time, so the number of times disk is accessed decreases, which improves efficiency.
The steps for file operations in C programming are as follows-
Open a file
Read the file or write data in the file
Close the file
The functions fopen() and fclose() are used for opening and closing the files respectively.
Opening a file
A file must be opened before any I/O operations can be performed on that file. The process of
estalishing a connection between the program and file is called opening the file.
A structure named FILE is defined in the file stdio.h that contains all information about the
file like name, status, buffer size, current position, end of file status etc. All these details are
hidden from the programmer and the operating system takes care of all these things.
A file pointer is a pointer to a structure of type FILE. Whenever a file is opened, a structure of
type FILE is associated with it, and a file pointer that points to this structure identifies this
file. The function fopen() is used to open a file.
Declaration:
FILE *fopen("filename.txt", "opening mode");
fopen() function takes two strings as arguments, the first one is the name of the file to be
open and the second one is the mode that decides which operations (read, write, append etc) are
to performed on the file. On success, fopen() returns a pointer of type FILE and on error it
returns NULL.
The return value of fopen() is assigned to a FILE pointer declared previously. For example
↓
After opening the file with fopen(), the name of file is not used in the program for any
operation on it. Whenever we have to perform a operation on the file, we'll use the file pointer
returned by fopen() function. so the name of file sometimes known as external name, while the
file pointer associated with it is known as its internal name.
The second argument represents the mode in which the file is to be opened. The possible value of
mode are ↓
"w" (write): If the file doesn't exist then this mode creates a new file for writing, and if
the file already exists then the previous data is erased and the new data entered is written
to the file.
"a" (append): If the file doesn't exist then this mode creates a new file, and if the file
already exists then the new data entered is appended at the end of existing data. In this
mode, the data existing in the file is not erased as in "w" mode.
"r" (read): This mode is used for opening an existing file for reading purpose only. The
file to be opened must exist and the previous data of file is not erased.
"w+" (write+read): This mode is same as "w" mode but in this mode we can also read and
modify the data. If the file doesn't exist then a new file is created and if the file exists
then previous data is erased.
"r+" (read+write): This mode is same as "r" mode but in this mode we can also write and
modify existing data. The file to be opened must exist and the previous data of file is not
erased. Since we can add new data modify existing data so this mode is also called update
mode.
"a+" (append+read): This mode is same as the "a" mode but in this mode we can also read the
data stored in the file. If the file doesn't exist, a new file is created and if the file
already exists then new data is appended at the end of existing data. We cannot modify
existing data in this mode.
To open a file in binary mode we can append 'b' to the mode, and to open the file in text
mode 't' can be appended to the mode. But since text mode is the default mode, 't' is
generally omitted while opening file in text mode. For example:
"wb" → Binary file opened in write mode
"ab+" or "a+b" → Binary file opened in append mode
"rt+" or "r+t" → Text file opened in update mode
"w" → Text file open in write mode
Errors in Opening Files
If an error occurs in opening file, then fopen() returns NULL. So we can check for any errors in
opening by checking the return value of fopen().
Errors in opening a file may occur due to various reasons, for example-
If we try to open a file in read or update mode, and the file doesn't exist or we do not
have read permission on that file.
If we try to create a file but there is no space on the disk or we don't have write
permission.
If we try to create file that already exists and we don't have have permission to delete
that file.
Operating system limits the number of files that can be opened at a time and we are
trying to open more files than that number.
We can give full pathname to open a file. Suppose we want to open a file in DOS whose path is
"E:\booksdir\names.dat, then we'll have to write as-
fp = fopn("E:\\booksdir\\names.dat", "r");
Here we have used double backslash because single backslash inside string is considered as
an escape character, '\b' and '\n' will be regarded as escape sequences if we use single
backslash. In Unix, a single forward slash can be used.
Never give the mode in single quotes, since it is a string not a character constant.
fp = fopen("file.data", 'w'); /* Error */
Closing a file
The file that was opened using fopen() function must be closed when no more operations are to be
performed on it. After closing the file, connection between file and program is broken.
Declaration ↓
fclose(File *fptr);
On closing the file, all the buffers associated with it are flushed i.e. all the data that is in the
buffer is written to the file. The buffers allocated by the system for the file are freed after the
file is closed so that these buffers can be available for other files.
Although all the files are closed automatically when the program terminates normally, but sometimes
it may be necessary to close the file by fclose() e.g. when we have to reopen the file in some open
mode or when we exceed the number of opened files permitted by the system. Moreover it is a good
practise to close file explicitly by fclose() when no more operations are to be performed on that
file because it becomes clear to the reader of the program that the file has no use now.
fclose() returns EOF on error and 0 on success (EOF is a constant defined in stdio.h and its value
is -1). An error in fclose() may occur when there is not sufficient space on the disk or when disk
has been taken out of the drive.
If more than one files are opened, then we can close all the files by calling fclose() for each of
them.
fclose(fptr1);
fclose(fptr2);
....
....
We can also close multiple files by calling a single function fcloseall(). It closes all the opened
files.
Declaration: fcloseall(void);
Structure of a General File Program
main()
{
FILE *fp;
fp = fopen("filename", "mode");
...
...
...
fclose(fp);
} /* End of Main */
The functions used for file I/O are
Character I/O - fgetc(), fputc(), getc(), putc()
String I/O - fgets(), fputs()
Integer I/O - getw(), putw()
Formatted I/O - fscanf(), fprinf()
Record I/O - fread(), fwrite()
#include <stdio.h>
int main()
{
FILE *a, *b, *c, *d;
a = fopen("up.txt", "w");
b = fopen("lw.txt", "w");
c = fopen("dg.txt", "w");
d = fopen("sp.txt", "w");
char str[30];
int i;
printf("Enter a string : ");
scanf("%s", str);
for (i = 0; str[i] != '\0'; i++)
{
if (str[i] >= 'A' && str[i] <= 'Z')
putc(str[i], a);
else if (str[i] >= 'a' && str[i] <= 'z')
putc(str[i], b);
else if (str[i] >= '0' && str[i] <= '9')
putc(str[i], c);
else
putc(str[i], d);
}
fclose(a);
fclose(b);
fclose(c);
fclose(d);
return 0;
}
Program that reads data from one file and writes it on another file.
For this we should know about few things:
EOF: EOF stands for "End of File." It is a special value that represents the end of a
file when reading data from it. It is a constant defined in the header file in C.
#include <stdio.h>
int main() {
FILE *sourceFile = fopen("source.txt", "r");
FILE *destinationFile = fopen("destination.txt", "w");
if (sourceFile == NULL || destinationFile == NULL) {
printf("Failed to open files.\n");
return 1;
}
int ch;
while ((ch = fgetc(sourceFile)) != EOF) { // The variable 'ch' is updated automatically with each call to fgetc(file) inside the while loop.
fputc(ch, destinationFile);
}
fclose(sourceFile);
fclose(destinationFile);
printf("Data copied successfully.\n");
return 0;
}
Formatted and Unformatted I/O function
In C programming, input/output (I/O) functions are categorized into two types: formatted and unformatted
file handling functions. These functions serve different purposes and have distinct characteristics.
Here's an explanation of each type:
Formatted I/O functions:
Formatted I/O functions deal with data in a human-readable format. They allow you to read data
from or write data to files using a specified format. These functions provide more control over the
data's representation and formatting. Some commonly used formatted I/O functions in C include:
printf(): Used to format and print data to the standard output (usually the console) or a
file.
scanf(): Used to read and parse formatted data from the standard input (usually the
keyboard) or a file.
fprintf(): Similar to printf(), but writes the formatted data to a specified file instead of
the standard output.
fscanf(): Similar to scanf(), but reads formatted data from a specified file instead of the
standard input.
Formatted I/O functions use format specifiers (such as %d, %f, %s, etc.) to specify the data type
and formatting options for input or output. These functions provide flexibility in how data is
displayed or interpreted.
Unformatted I/O Functions:
Unformatted I/O functions deal with data in its raw binary format. They read or write data
directly as a sequence of bytes without any specific formatting. These functions are commonly used
for binary file handling, where data is read or written without any special formatting
considerations. Some commonly used unformatted I/O functions in C include:
fread(): Reads a specified number of bytes directly from a file.
fwrite(): Writes a specified number of bytes directly to a file.
getc(), fgetc(): Reads a single character (byte) from a file.
putc(), fputc(): Writes a single character (byte) to a file.
Unformatted I/O functions are useful when working with binary data, such as image files, sound
files, or structured data stored in binary format. They provide a straightforward way to read and
write raw data without any specific formatting requirements.
It's important to choose the appropriate type of I/O functions based on your requirements and the nature of
the data you are dealing with. Formatted I/O functions provide more control over formatting and are suitable
for human-readable data, while unformatted I/O functions work with raw binary data.
Till now, we have done file handling using function calls, but now we will do using system calls
But why?
The use of system calls, such as open(), close(), read(), etc., offers some key differences compared
to higher-level functions like fopen(), fclose(), fread(), etc. Here are the differences between
using system calls and higher-level functions:
Level of abstraction:
System calls provide a lower-level interface to the operating system, allowing direct
interaction with the underlying functionality. They offer finer-grained control and access
to system resources. In contrast, higher-level functions, such as those provided by the C
standard library, abstract away many low-level details and provide a more user-friendly
interface. They handle certain operations and error checking internally, simplifying the
code for common file operations.
Protability:
System calls can be platform-specific since they directly interact with the underlying
operating system. On the other hand, higher-level functions are often implemented in a
portable manner across different platforms. The C standard library functions provide a
consistent interface regardless of the underlying operating system, making it easier to
write cross-platform code.
Flexitybility and control:
System calls offer more flexibility and control over file operations. They allow you to
use various flags and options to fine-tune the behavior of file operations, such as opening
modes, file permissions, and file positioning. System calls also enable advanced operations
like asynchronous I/O, file locking, and handling special file types. Higher-level functions
may not provide all of these capabilities or expose them directly.
Performance:
System calls generally have lower overhead compared to higher-level functions. This is
because higher-level functions often involve additional layers of abstraction and may
perform additional checks and operations. System calls directly interact with the operating
system kernel, bypassing some of these overheads. However, the performance difference might
be negligible for many applications unless low-level optimizations are required.
System calls
System calls are the calls that a program makes to the system kernel to provide the services to which
the program does not have direct access.
For example, providing access to input and output devices such as monitors and keywords. We can use
various functions provided in the C programming language for input/output system calls such as create,
open read, write, etc.
Important terminology
File descriptor
The file descriptor is an integer that uniquely itentifies an open file of the process.
In operating systems, a file descriptor is an abstract representation of an opened file or I/O
stream. It is a non-negative integer that uniquely identifies an open file within a process.
File descriptors are used to perform various operations on files, such as reading, writing, and
closing.
When you open a file using 'open()' system call library function, a file descriptor is
returned. You can think of it as a handle or reference to the opened file. This file descriptor
is then used in subsequent system or library calls to perform operations on the file.
Create
The create() function is used to create a new empty file in C. We can specify the permission and the
name of the file which we wan to create using the create() function. It is defined inside
<unistd.h> header file and the flags that are passed as arguments are defined inside
<fcntl.h> header file.
Syntax of create()
int create (char *filename, mode_t mode);
Parameter
filename: name of the file which you want to create
Mode: indicates permissions of the new file
Return value
return first unused file descriptor (generally 3 when first creating use in the process because 0,
1, 2 fd are reserved)
return -1 when an error
How create() works in OS
Create a new empty file on disk
Create file table entry
Set the first unused file descriptor to point the file table entry.
Return file descriptor used, -1 upon failure.
Open
The open() function in C is used to open the file for reading, writing, or both.
It is also capable of creating the file if it does not exist.
It is defined inside <unistd.h> header file and the flags that are passed as arguments are
defined inside <fcntl.h> header file.
Parameters
Path: path to the file which we want to open.
Use the absolute path beginning with "/" when you are not working in the same directory as
the C source fle.
Use relative path which is only the file name with extension, when you are working in the
same directory as the C source file.
Flags: It is used to specify how you want to open the file. We can use the following flags.
Flags Description
O_RDONLY Opens the file in read-only mode.
O_WRONLY Opens the file in write-only mode.
O_RDWR Opens the files in read and write mode.
O_CREAT Create a file if it doesn't exists.
O_APPEND Opens the file and places the cursor at the end of the contents
O_TRUNC Truncate the file to zero length if it exists.
Example of open
#include <fcntl.h>
#include <unistd.h>
int main() {
int fileDescriptor = open("example.txt", O_RDONLY);
if (fileDescriptor == -1) {
// Handle error if file couldn't be opened
}
// Perform operations on the opened file
close(fileDescriptor); // Close the file descriptor
return 0;
}
Close
The close() function in C tells the operating system that you are done with a file descriptor and
closes the file pointed by the file descriptor. It is defined inside <unistd.h> header
file.
Parameter
fd: File descriptor of the file that you want to close.
Return value
0 on success
-1 on error
Example ↓
#include <unistd.h>
int main() {
int fileDescriptor = open("example.txt", O_RDONLY);
if (fileDescriptor == -1) {
// Handle error if file couldn't be opened
}
// Perform operations on the opened file
close(fileDescriptor); // Close the file descriptor
return 0;
}
Read
From the file indicated by the file descriptor f, the read() function reads the specified amount of
bytes cnt of input into the memory area indicated by buf. A successful read() updates the access
time for the file. The read() function is also defined inside the <unistd.h> header file.
Parameters
fd: file descriptor of the file from which data is to be read.
buf: buffer to read data from
cnt: length of the buffer
Return value
returns number of bytes read on success
returns - on reaching the end of the file
return -1 on error
returns -1 on signal interrupt
Important points
buf needs to point to a valid memory locaton with a length not smaller than the specified
size because of overflow.
fd should be a valid file descriptor if fd is NULL then the read should generate an error.
cnt is the requested number of bytes read, while the return value is the actual number of
bytes read. Also, some times read system call should read fewer bytes than cnt.
Example ↓
#include <fcntl.h>
#include <unistd.h>
#define BUFFER_SIZE 1024
int main() {
int fileDescriptor = open("example.txt", O_RDONLY);
if (fileDescriptor == -1) {
// Handle error if file couldn't be opened
}
char buffer[BUFFER_SIZE];
ssize_t bytesRead = read(fileDescriptor, buffer, BUFFER_SIZE);
if (bytesRead == -1) {
// Handle error if reading failed
}
// Process the read data
close(fileDescriptor); // Close the file descriptor
return 0;
}
Write
Writes cnt bytes from buf to the file or socked associated with fd. cnt should not be greater then
INT_MAX (defined in the limits.h header file). If cnt is zero, write() simply returns 0 without
attempting any other action.
The write() is also defined inside <unistd.h> header file.
Parameters
fd: file descriptor
buf: buffer to write data to.
cnt: length of the buffer.
Return value
returns number of bytes read on success
returns - on reaching the end of the file
return -1 on error
returns -1 on signal interrupt
Example ↓
#include <fcntl.h>
#include <unistd.h>
int main() {
int fileDescriptor = open("example.txt", O_WRONLY | O_CREAT, 0644);
if (fileDescriptor == -1) {
// Handle error if file couldn't be opened
}
char data[] = "Hello, world!";
ssize_t bytesWritten = write(fileDescriptor, data, sizeof(data) - 1);
if (bytesWritten == -1) {
// Handle error if writing failed
}
close(fileDescriptor); // Close the file descriptor
return 0;
}
0644: File permission mode (used with open()), specifies read and write permissions for the owner and
read-only permissions for others.