File Organization
File structures Terminologies
- Record: (Group/Segment) it is a collection of information items about a particular entity.
E.g. A collection of information about a student in a college.
- Field/Item: AN item of a record is a unit of meaningful information about an entity. E.g.
Name, Id no, Address of student.
- File: A collection of records involving a set of entities with certain aspects in common and
organized for some particular purpose is called file. E.g. Collection of all students' record make a
file.
- Key: A record item or field which uniquely identifies a record in a file is called a key.
E.g. Id_no of a student for student record.
- Database: If the set of files is used by the application program for some particular
application area and if these files exhibit certain associations or relationships between the
records of the files then such a collection of files is called as database. E.g. student data base
which has files/tables like student information, result, attendence, registration, placement etc. In
which all these different data have relationship and association between them.
- Transaction: A particular operation involving a record or set of records is called as
transaction. E.g. "Delete all students of batch = 00".
- Fields, records, files and databases are logical terms.
- Block/Physical record: Several logical records can be groped together to form a single
physical entity to be stored on physical device is called as block or physical record.
- Serial-access device: The devices on which data is stored serially and can be retrieved only
serially and access time of any data is proportional to the location of the data on device, are
called as serial device. These device are cheap compared to direct access devices. E.g. Magnetic
tape.
- Direct-access Device: Direct access devices, also known as random access devices, are
storage devices that allow data to be accessed directly, regardless of its physical location in
memory. This means that the access time for retrieving any record from the device is constant and
does not depend on the sequential order of the data. These devices are costly. E.g. Magnetic disk,
hard disk drives, solid-state drives (SSDs).
- File organization: The technique used to represent and store the records on a file is called as
file organization.
- The 4 fundamental file organization are:
- Sequential
- Direct/Random
- Index sequential
- Multi key
Sequential file organization
- In sequential file organization, data is stored in a sequential manner, one record after another,
based on the order of insertion. Records are accessed sequentially, starting from the beginning of
the file. Sequential access is efficient, but random access or modifications to specific records can
be time-consuming.
Storing sequential file
- Sequential files can be stored on serial-access as well as direct-access devices.
Advantages
- Faster access to next record.
- Access time is very good.
- Simplicity
Disadvantages
- If pattern of access do not match with record ordering pattern then access time goes very high.
- Data can be accessed sequentially only.
Random/Direct File organization
- When records in a file are arranged in such a way so that the individual records can be directly
accessed whenever they are needed is called as direct File organization.
- In this file organization there exists a direct relationship between a key used to identify the
particular record and the location of the record in a file.
- The records do not necessarily appear physically in sorted order by their key values as in
sequential files.
- When relative file is created the relationship that will be used to translate between key values and
physical address of record in file is assigned. That relationship is called as mapping function R.
- Direct files are typically used when only selected records need to be accessed from a file, rather
than retrieving all records simultaneously, such as in interactive processing scenarios. For
instance, in an online banking system, a user may only wish to access their own account.
Storing direct file
- Direct files can only be stored on direct-access devices.
Advantages:
- Ability to access individual record directly, i.e. a record can be retrieved, inserted, modified
or deleted without affecting other records.
- Access time for one record at a time is very good.
Disadvantages
- When all records are to be accessed at a time then it gives poor performance.
Index Sequential file organization
- An effective way to organize a collection of records when there is a need of both accesses. To
access the records sequentially by some key value and also to access the records individually by
some key is called as index sequential file organization.
- It supports combination of access types that are supported by a sequential file and a relative file.
- It is used when there is a need for both batch and interactive processing. E.g. Student management
system in which we have to calculate result for every student as well as we have requirement like of
address to send a letter to only that student who is detained. So the file can be stored
sequentially based on student id. And index table is also provided with key as student id which
directly maps student id to its record.
Advantages
- Ability to access individual record directly as well as all the record sequentially.
Multi key file organization
- When a file records are made accessed based on more than one key are called Multikey file
organization.
- This file organization is needed many times. E.g. In banking system we keep records of accounts in
file. Now account holder needs account information which can be access through account, while load
officer needs account records with a given value of overdue limit. So we need to provide to access
path to the record based on different need. Generally these files are index sequential file in which
file is stored sequentially based on primary key and more than one index table are provided based on
different keys. Basically there are 2 approaches for implementing multikey file organization.
- Inverted file organization
- Multi list file organization
1. Inverted file organization
- In this file organization a key's inversion index contain all of the values that the key
presently has in the records of the data file. Each key-value entry in the inversion index
points to all the data records that have the corresponding value. The data file is said to be
inverted on that key.
- Inverted files are sorted on inversion index so that binary search can be applied to find out
index or record. Whenever record is added in data file its corresponding entry has to be made in
inverted file.
2. Multi list file organization
- In multi list file organization the index contain all values that the secondary key has in data
file same as inverted file but the difference is that the entry in the multi index for a
seconday key is pointer to the first data record with thay key value. That record contains
pointer to second record having same key. Thus there is a linked of data records for each value
of secondary key. Multi list chains are usually bidirectional and occasionally are circular to
improve update operation.