Hello, everyone, I am the talented brother.
Today I went home from work and received a question from my friend. He needed to get the date and time when processing date-type data, but his car overturned. So, in this q&A, we will introduce some common time attribute acquisition techniques in daily data processing.
Of course, the simplest way I said here should be the easiest to think of under unfamiliar circumstances, which is what I did a year ago.
Let’s do it!!
The case data will be replied to 955 in the background and collected through the q&A folder.
Directory:
[toc]
1. General breakdown
Those of you who are familiar with Excel operations should be more likely to use separate operations.
1.1. Excel operation process
In this video, we’re going to try excel.
Open the file, select the data area, and set it in the menu bar of Data – > Catalog operation. Visually, it should be classified by space, but there is a problem (no response) when you select it.
In this case, don’t panic. There is a good chance that the delimiter you see may not be usable. You can manually copy what you think of as a “space” and paste it into another one.
Use it.
At this point, we seem to have a pretty good idea of what caused our friend to roll over. It’s possible that he used a space delimiter to sort things out in Python as well. What should python do with sorting in this case?
2.1. Python Operation process
Read the data preview and see what happens
import pandas as pd
df = pd.read_excel(R '.data 'time to process case data.xlsx')
df.head()
Copy the code
Take a quick look at field types and so on
df.info()
Copy the code
You can see that both fields are of type Object.
Here’s a look at the rollover process operated by friends:
ValueError: ValueError: ValueError: ValueError: ValueError: ValueError: ValueError
Time to answer questions
Normally, I print a single piece of data and see what happens
df['Resource allocation time'] [0]
Copy the code
'2021-09-01\xa016:50:56'
Copy the code
Sure enough, we found that the naked eye actually saw the space is \xa0, directly use the space “” used for sorting naturally not work.
The space we usually use is \x20, which is in the range of standard ASCII visible characters 0x20 to 0x7e.
While \xa0 belongs to the extended character set character in LATin1 (ISO/IEC_8859-1) and represents the uninterrupted whitespace character
So, we can break it down like this:
df['Resource allocation time'].str.split('\xa0')
Copy the code
0 [2021-09-01.16:50:56]
1 [2021-09-01.16:50:56]
2 [2021-09-01.16:50:56]
3 [2021-09-01.16:50:56]
4 [2021-09-01.16:50:56]...560 [2021-09-13.21:05:14]
561 [2021-09-14.09:33:02]
562 [2021-09-14.09:33:02]
563 [2021-09-14.10:26:42]
564 [2021-09-14.10:29:58]
Name: resource allocation time,Length: 565, dtype: object
Copy the code
Of course, for the above result, slice selection according to the operation of friends also does not meet the requirements, please see:
df['Resource allocation time'].str.split('\xa0') [0]
Copy the code
['2021-09-01', '16:50:56']
Copy the code
So, how do you implement requirements? To continue, we need to introduce the apply function method and call a slice processing:
df['Resource allocation time'].apply(lambda x : x.split('\xa0') [0])
Copy the code
0 2021-09-01
1 2021-09-01
2 2021-09-01
3 2021-09-01
4 2021-09-01.560 2021-09-13
561 2021-09-14
562 2021-09-14
563 2021-09-14
564 2021-09-14
Name: resource allocation time,Length: 565, dtype: object
Copy the code
In fact, the easiest way to itemize in this case is to do it without delimiters. See:
df['Resource allocation time'].apply(lambda x : x.split()[0])
Copy the code
0 2021-09-01
1 2021-09-01
2 2021-09-01
3 2021-09-01
4 2021-09-01.560 2021-09-13
561 2021-09-14
562 2021-09-14
563 2021-09-14
564 2021-09-14
Name: resource allocation time,Length: 565, dtype: object
Copy the code
This is the syntax for scheduling, but pandas has a variety of flexible methods for handling time-typed data.
2. Pandas time visitors
For your friend’s needs, you can see the line of code he commented out, well, it’s actually easier to use.
Pandas can easily recognize and convert data that looks like time to a time type (pd.to_datetime()), and astype can be used to cast it.
pd.to_datetime(df['Resource allocation time'])
Copy the code
0 2021-09-01 16:50:56
1 2021-09-01 16:50:56
2 2021-09-01 16:50:56
3 2021-09-01 16:50:56
4 2021-09-01 16:50:56.560 2021-09-13 21:05:14
561 2021-09-14 09:33:02
562 2021-09-14 09:33:02
563 2021-09-14 10:26:42
564 2021-09-14 10:29:58
Name: resource allocation time,Length: 565, dtype: datetime64[ns]
Copy the code
As you can see, dType becomes dateTime64 [ns] time type.
For time-type data, PANDAS provides a bunch of handy ways to play around with it. For the needs of friends, we directly use the time accessor. Dt.< properties >
pd.to_datetime(df['Resource allocation time']).dt.date
Copy the code
0 2021-09-01
1 2021-09-01
2 2021-09-01
3 2021-09-01
4 2021-09-01.560 2021-09-13
561 2021-09-14
562 2021-09-14
563 2021-09-14
564 2021-09-14
Name: resource allocation time,Length: 565, dtype: object
Copy the code
It also has the following properties for you to choose from:
# time
pd.to_datetime(df['Resource allocation time']).dt.time
Copy the code
0 16:50:56
1 16:50:56
2 16:50:56
3 16:50:56
4 16:50:56.560 21:05:14
561 09:33:02
562 09:33:02
563 10:26:42
564 10:29:58
Name: resource allocation time,Length: 565, dtype: object
Copy the code
More properties are as follows, you can demonstrate yourself familiar with:
# years
pd.to_datetime(df['Resource allocation time']).dt.year
In #
pd.to_datetime(df['Resource allocation time']).dt.month
#,
pd.to_datetime(df['Resource allocation time']).dt.day
# hours
pd.to_datetime(df['Resource allocation time']).dt.hour
# points
pd.to_datetime(df['Resource allocation time']).dt.minute
# s
pd.to_datetime(df['Resource allocation time']).dt.second
# and more (not expanded here)
Copy the code
For more information about pandas’ time series data, we will cover the following topics. The scenes are very common and the techniques are easy to use once you get used to them.
The text will be used to convert timestamps to dates. The text will be used to convert timestamps to dates. The text will be used to convert timestamps to dates.
Warm reminder: The case data should be replied to 955 in the background and obtained through the q&A folder.