File Organization

File structures Terminologies

Record: (Group/Segment) it is a collection of information items about a particular entity. E.g. A collection of information about a student in a college.
Field/Item: AN item of a record is a unit of meaningful information about an entity. E.g. Name, Id no, Address of student.
File: A collection of records involving a set of entities with certain aspects in common and organized for some particular purpose is called file. E.g. Collection of all students' record make a file.
Key: A record item or field which uniquely identifies a record in a file is called a key. E.g. Id_no of a student for student record.
Database: If the set of files is used by the application program for some particular application area and if these files exhibit certain associations or relationships between the records of the files then such a collection of files is called as database. E.g. student data base which has files/tables like student information, result, attendence, registration, placement etc. In which all these different data have relationship and association between them.
Transaction: A particular operation involving a record or set of records is called as transaction. E.g. "Delete all students of batch = 00".
Fields, records, files and databases are logical terms.
Block/Physical record: Several logical records can be groped together to form a single physical entity to be stored on physical device is called as block or physical record.
Serial-access device: The devices on which data is stored serially and can be retrieved only serially and access time of any data is proportional to the location of the data on device, are called as serial device. These device are cheap compared to direct access devices. E.g. Magnetic tape.
Direct-access Device: Direct access devices, also known as random access devices, are storage devices that allow data to be accessed directly, regardless of its physical location in memory. This means that the access time for retrieving any record from the device is constant and does not depend on the sequential order of the data. These devices are costly. E.g. Magnetic disk, hard disk drives, solid-state drives (SSDs).

Sequential file organization

In sequential file organization, data is stored in a sequential manner, one record after another, based on the order of insertion. Records are accessed sequentially, starting from the beginning of the file. Sequential access is efficient, but random access or modifications to specific records can be time-consuming.

Storing sequential file

Sequential files can be stored on serial-access as well as direct-access devices.

Advantages

Faster access to next record.
Access time is very good.
Simplicity

Disadvantages

If pattern of access do not match with record ordering pattern then access time goes very high.
Data can be accessed sequentially only.

Random/Direct File organization

When records in a file are arranged in such a way so that the individual records can be directly accessed whenever they are needed is called as direct File organization.
In this file organization there exists a direct relationship between a key used to identify the particular record and the location of the record in a file.
The records do not necessarily appear physically in sorted order by their key values as in sequential files.
When relative file is created the relationship that will be used to translate between key values and physical address of record in file is assigned. That relationship is called as mapping function R.
Direct files are typically used when only selected records need to be accessed from a file, rather than retrieving all records simultaneously, such as in interactive processing scenarios. For instance, in an online banking system, a user may only wish to access their own account.

Storing direct file

Direct files can only be stored on direct-access devices.

Advantages:

Ability to access individual record directly, i.e. a record can be retrieved, inserted, modified or deleted without affecting other records.
Access time for one record at a time is very good.

Disadvantages

When all records are to be accessed at a time then it gives poor performance.

Index Sequential file organization

An effective way to organize a collection of records when there is a need of both accesses. To access the records sequentially by some key value and also to access the records individually by some key is called as index sequential file organization.
It supports combination of access types that are supported by a sequential file and a relative file.
It is used when there is a need for both batch and interactive processing. E.g. Student management system in which we have to calculate result for every student as well as we have requirement like of address to send a letter to only that student who is detained. So the file can be stored sequentially based on student id. And index table is also provided with key as student id which directly maps student id to its record.

Advantages

Ability to access individual record directly as well as all the record sequentially.

Multi key file organization

When a file records are made accessed based on more than one key are called Multikey file organization.
This file organization is needed many times. E.g. In banking system we keep records of accounts in file. Now account holder needs account information which can be access through account, while load officer needs account records with a given value of overdue limit. So we need to provide to access path to the record based on different need. Generally these files are index sequential file in which file is stored sequentially based on primary key and more than one index table are provided based on different keys. Basically there are 2 approaches for implementing multikey file organization.
1. Inverted file organization
2. Multi list file organization

1. Inverted file organization

In this file organization a key's inversion index contain all of the values that the key presently has in the records of the data file. Each key-value entry in the inversion index points to all the data records that have the corresponding value. The data file is said to be inverted on that key.
Inverted files are sorted on inversion index so that binary search can be applied to find out index or record. Whenever record is added in data file its corresponding entry has to be made in inverted file.

2. Multi list file organization

In multi list file organization the index contain all values that the secondary key has in data file same as inverted file but the difference is that the entry in the multi index for a seconday key is pointer to the first data record with thay key value. That record contains pointer to second record having same key. Thus there is a linked of data records for each value of secondary key. Multi list chains are usually bidirectional and occasionally are circular to improve update operation.