Reading and Chunking Really Large Files With Python
In this tutorial, we will exist discussing about handling files and resources in Python. We volition write sample codes for dissimilar file operations such as read, write, append etc. As well, we will see how Python eases reading of large text files and images. We will also deal with context managers while performing such operations to prevent any retentiveness leaks.
Open File in Python
We make a call to buil-in function open() to open up a file in Python. This function takes number of arguments but the required parameter is the path to file and returns a file object whose type depends on the mode. Below is the signature of open()
def open(file, mode='r', buffering=None, encoding=None, errors=None, newline=None, closefd=True):
In the above definition, file is the path to file.
mode is an optional string that specifies the manner in which the file is opened. It defaults to 'r' which ways open for reading in text mode. Nosotros volition discuss other modes later.
buffering is an optional integer used to set the buffering policy. Binary files are buffered in stock-still-size chunks.
encoding to tell Python runtime about the encoding used by file. This should only be used in text way.
errors is an optional string that specifies how encoding errors are to exist handled---this statement should non be used in binary way.
newline controls how universal newlines works.
If closefd is Fake, the underlying file descriptor will exist kept open up when the file is closed.
Python file open modes
| Character | Meaning |
|---|---|
| 'r' | open for reading (default) |
| 'w' | open for writing, truncating the file first |
| 'x' | create a new file and open it for writing |
| 'a' | open for writing, appending to the stop of the file if it exists |
| 'b' | binary fashion |
| 't' | text way (default) |
| '+' | open a deejay file for updating (reading and writing) |
Example
f = open('examination.txt', mode= 'wt', encoding= 'utf-viii') Here, due west significant write and t for text. All manner should contain read, write and append mode.
Writing to File in Python
write() function is used to write to a file. It accepts text to be written to the file equally an argument. E'er call up to apply close() methodafter any file operation. This method returns the number of codepoints and not the number of bytes. Beneath is an example.
f = open('test.txt', mode= 'wt', encoding= 'utf-eight') f.write('Hello In that location! ') f.write('I am learning Python \n') f.write('What are yous learning?') f.close() Above snippet, creates a file, if information technology does not exist, and writes given text in it. If exists, it truncates the file and writes the new set of text that we provided. If we do not want to override the text that is already present in the file and so we can use 'a' character meaning suspend.
While writing or reading files, we can use seek(0) anytime to movement the pointers to the offset of the file. Hence, the file write volition commencement from the first of the file and overrides any text comes to its place. Information technology does non override all the text instead it overrides the text that is required to suit the new text.
f = open('examination.txt', style= 'wt', encoding= 'utf-8') f.write('Hullo At that place! ') f.write('I am learning Python \due north') f.seek(0) f.write('overridden ') f.close() While executing in a higher place code, Hello There will exist overriden past the text 'overridden ' equally we moved the pointer to starting time of the file with seek()
Instead of using write() method, we can as well use writelines() function to write multiple lines at one time.
Reading Text File in Python
read() method is used to read file in Python. It accepts an optional parameter as number of characters to read from the file and render the text from the text file or binary data for binary file.
Example
f = open('examination.txt', mode= 'rt', encoding= 'utf-8') chunk_text = f.read(12) print(chunk_text) print('*******************') text = f.read() print(text) f.close() Output
With the first read(12) method, it merely read 12 characters from the get-go of the text file and read() returns all the text in a text file.
Hello There! ******************* I am learning Python What are y'all learning?
It is always suggested to read files in chunk if you are not sure most the size of file or the file size is bigger.
We can also read a text file line by line using readline() or readlines(). readlines() method returns all the lines of the text file seperated with line break(\n) into a list whereas readline() reads one line at a fourth dimension with every line ending with '\n' except the concluding line.
Reading Files as Iterator
readlines() function reads all the text at once which is not suitable to read big files whereas readline() method tin can not exist used to large files as well every bit nosotros may non know the iteration we may require to read all the text from a file. To overcome this effect in some way, we tin can apply Iterators to read text from a file. It reads text file with i line at a time till the end of the line. Below is an example.
f = open('test.txt', mode= 'rt', encoding= 'utf-8') for line in f: print(line, end='') f.close() Using Context Managers while Reading File
Python provides with block to force the resource cleanup after whatsoever I/O operations with context-managers.
All the examples shown above uses close() function to close the I/O operation later on completing the read or write operation. Just what if any exception occurs and the close() function is non executed. In that case, in that location tin be chance of memory leak. We can avoid this by putting close() role inside finally block. But Python provides a cleaner way to achive this using context-managers.
with open('test.txt', mode='rt', encoding='utf-viii') equally f: for line in f: print(line, end='') Though we have access to file object outside the with block, we can't perform whatsoever file operations outside the with cake because the file is already shut with the cease of with block.
Copying Ane File Content to Some other File in Python
We tin can use iterator to read each line from the source file and copy text content to some other file to create a copy or duplicate a file in Python. Below is the sample implementation.
with open up('exam.txt', way='rt') as f: with open('test1.txt', mode="wt") every bit thou: for line in f: g.write(line) Reading and Writing Binary Files or Images in Python
The abstruse layer to read and write prototype or binary file is very similar to read and write text file. We merely need to supply the way of file as binary.
with open('image.PNG', mode='rb') as f: with open up('image1.PNG', manner="wb") every bit g: for line in f: g.write(line) Reading Large File in Python
Due to in-memory contraint or retentivity leak issues, it is always recommended to read large files in clamper. To read a large file in chunk, nosotros tin can use read() function with while loop to read some chunk data from a text file at a time.
with open up('test.txt', mode='rt') equally f: text = f.read(100) # Reads the showtime 100 graphic symbol and moves pointer to 101th character while len(text) > 0: print(text) text = f.read(100) # Motion arrow to end of adjacent 100 grapheme Conclusion
In this tutorial, we discussed handling files and resource in Python. We implemented sample codes for different file operations such equally read, write, append etc. Also, nosotros saw how Python eases reading of large text files and images and used context managers while performing file operations to preclude any memory leaks.
Source: https://www.devglan.com/python/file-handling-in-python
Post a Comment for "Reading and Chunking Really Large Files With Python"