Hello, everyone, I am the talented brother.

Today I went home from work and received a question from my friend. He needed to get the date and time when processing date-type data, but his car overturned. So, in this q&A, we will introduce some common time attribute acquisition techniques in daily data processing.

Of course, the simplest way I said here should be the easiest to think of under unfamiliar circumstances, which is what I did a year ago.

Let’s do it!!

The case data will be replied to 955 in the background and collected through the q&A folder.

Directory:

[toc]

1. General breakdown

Those of you who are familiar with Excel operations should be more likely to use separate operations.

1.1. Excel operation process

In this video, we’re going to try excel.

Open the file, select the data area, and set it in the menu bar of Data – > Catalog operation. Visually, it should be classified by space, but there is a problem (no response) when you select it.

In this case, don’t panic. There is a good chance that the delimiter you see may not be usable. You can manually copy what you think of as a “space” and paste it into another one.

Use it.

At this point, we seem to have a pretty good idea of what caused our friend to roll over. It’s possible that he used a space delimiter to sort things out in Python as well. What should python do with sorting in this case?

2.1. Python Operation process

Read the data preview and see what happens

import pandas as pd

df = pd.read_excel(R '.data 'time to process case data.xlsx')
df.head()
Copy the code

Take a quick look at field types and so on

df.info()
Copy the code

You can see that both fields are of type Object.

Here’s a look at the rollover process operated by friends:

ValueError: ValueError: ValueError: ValueError: ValueError: ValueError: ValueError

Time to answer questions

Normally, I print a single piece of data and see what happens

df['Resource allocation time'] [0]
Copy the code
'2021-09-01\xa016:50:56'
Copy the code

Sure enough, we found that the naked eye actually saw the space is \xa0, directly use the space “” used for sorting naturally not work.

The space we usually use is \x20, which is in the range of standard ASCII visible characters 0x20 to 0x7e.

While \xa0 belongs to the extended character set character in LATin1 (ISO/IEC_8859-1) and represents the uninterrupted whitespace character

So, we can break it down like this:

df['Resource allocation time'].str.split('\xa0')
Copy the code
0      [2021-09-01.16:50:56]
1      [2021-09-01.16:50:56]
2      [2021-09-01.16:50:56]
3      [2021-09-01.16:50:56]
4      [2021-09-01.16:50:56]...560    [2021-09-13.21:05:14]
561    [2021-09-14.09:33:02]
562    [2021-09-14.09:33:02]
563    [2021-09-14.10:26:42]
564    [2021-09-14.10:29:58]
Name: resource allocation time,Length: 565, dtype: object
Copy the code

Of course, for the above result, slice selection according to the operation of friends also does not meet the requirements, please see:

df['Resource allocation time'].str.split('\xa0') [0]
Copy the code
['2021-09-01', '16:50:56']
Copy the code

So, how do you implement requirements? To continue, we need to introduce the apply function method and call a slice processing:

df['Resource allocation time'].apply(lambda x : x.split('\xa0') [0])
Copy the code
0      2021-09-01
1      2021-09-01
2      2021-09-01
3      2021-09-01
4      2021-09-01.560    2021-09-13
561    2021-09-14
562    2021-09-14
563    2021-09-14
564    2021-09-14
Name: resource allocation time,Length: 565, dtype: object
Copy the code

In fact, the easiest way to itemize in this case is to do it without delimiters. See:

df['Resource allocation time'].apply(lambda x : x.split()[0])
Copy the code
0      2021-09-01
1      2021-09-01
2      2021-09-01
3      2021-09-01
4      2021-09-01.560    2021-09-13
561    2021-09-14
562    2021-09-14
563    2021-09-14
564    2021-09-14
Name: resource allocation time,Length: 565, dtype: object
Copy the code

This is the syntax for scheduling, but pandas has a variety of flexible methods for handling time-typed data.

2. Pandas time visitors

For your friend’s needs, you can see the line of code he commented out, well, it’s actually easier to use.

Pandas can easily recognize and convert data that looks like time to a time type (pd.to_datetime()), and astype can be used to cast it.

pd.to_datetime(df['Resource allocation time'])
Copy the code
0     2021-09-01 16:50:56
1     2021-09-01 16:50:56
2     2021-09-01 16:50:56
3     2021-09-01 16:50:56
4     2021-09-01 16:50:56.560   2021-09-13 21:05:14
561   2021-09-14 09:33:02
562   2021-09-14 09:33:02
563   2021-09-14 10:26:42
564   2021-09-14 10:29:58
Name: resource allocation time,Length: 565, dtype: datetime64[ns]
Copy the code

As you can see, dType becomes dateTime64 [ns] time type.

For time-type data, PANDAS provides a bunch of handy ways to play around with it. For the needs of friends, we directly use the time accessor. Dt.< properties >

pd.to_datetime(df['Resource allocation time']).dt.date
Copy the code
0      2021-09-01
1      2021-09-01
2      2021-09-01
3      2021-09-01
4      2021-09-01.560    2021-09-13
561    2021-09-14
562    2021-09-14
563    2021-09-14
564    2021-09-14
Name: resource allocation time,Length: 565, dtype: object
Copy the code

It also has the following properties for you to choose from:

# time
pd.to_datetime(df['Resource allocation time']).dt.time
Copy the code
0      16:50:56
1      16:50:56
2      16:50:56
3      16:50:56
4      16:50:56.560    21:05:14
561    09:33:02
562    09:33:02
563    10:26:42
564    10:29:58
Name: resource allocation time,Length: 565, dtype: object
Copy the code

More properties are as follows, you can demonstrate yourself familiar with:

# years
pd.to_datetime(df['Resource allocation time']).dt.year
In #
pd.to_datetime(df['Resource allocation time']).dt.month
#,
pd.to_datetime(df['Resource allocation time']).dt.day
# hours
pd.to_datetime(df['Resource allocation time']).dt.hour
# points
pd.to_datetime(df['Resource allocation time']).dt.minute
# s
pd.to_datetime(df['Resource allocation time']).dt.second
# and more (not expanded here)
Copy the code

For more information about pandas’ time series data, we will cover the following topics. The scenes are very common and the techniques are easy to use once you get used to them.

The text will be used to convert timestamps to dates. The text will be used to convert timestamps to dates. The text will be used to convert timestamps to dates.

Warm reminder: The case data should be replied to 955 in the background and obtained through the q&A folder.